Customer Stories
Panasas selected for Darwin supercomputer at Cambridge University
posted on 13 June 2008 07:04
Users of the Darwin supercomputer at Cambridge University are getting much faster job throughput and higher utilization through the use of a Panasas AS5000 parallel file system which effectively replaces a previous parallel scratch file file system and NFS filestore using Dell MD1000 drives.
Darwin is a supercomputer operated by the University of Cambridge’s High Performance Computing Service (HPCS), at Daresbury Laboratory (left) under the management of Dr. Paul Calleja, HPCS director. The system is a platform for diverse software applications such as bioinformatics, computational fluid dynamics, computational physics and mathematics, and is available to users in all departments across Cambridge University.
It is a cluster of more than 1,000 Dell PowerEdge 1955 compute nodes (paired 3GHz Xeon 5160 Woodcrests and 8GB of memory) with 2,340 cores, running Red Hat Linux and interlinked by a QLogic InfiniBand fabric. The cluster is organized into nine computational units (CUs) with each CU consisting of 64 nodes in two racks, providing 256 cores.
In November 2006, Darwin was ranked as number 20 in the Top500 list of the world's most powerful supercomputers, delivering 27 trillion floating point operations per second.
The InfiniBand fabric, providing the lowest latency, the highest message rate and highest effective bandwidth of any cluster interconnect available, is used for Ethernet links and storage access for the compute nodes. There is 46TB of local cluster disk capacity and 60TB of filesystem storage provided by Dell PowerVault MD1000 disk arrays, with 15,000rpm, 73GB SAS drives, connected to the cluster network over 10 Gigabit Ethernet links. The storage pool was managed by the TerraGrid parallel filesystem.
Derek Burke, Panasas' EMEA commercial director, said: "(HPCS) is tightly monitored on the utilization of the system, first, and secondly on users' job run times. The system has to be processing as much of the time as possible to keep utilization high. The Panasas system helps keep utilization high because disk I/O waits are reduced."
There is a UK government edict that the system has to pay for itself and that means it's necessary that its compute nodes are operating as much of the time as possible. The HPCS must recover all purchasing, running, and future upgrade costs by means of a precise per-use accounting system.
The HPCS found that the utilization and job throughput levels were not high enough. One issue was a message passing interface (MPI) which delivered just 20-25 percent, and sometimes just 10-15 percent utilization. Calleja and his team identified disk I/O as the bottleneck. It is a big, big problem connecting a thousand plus processors to tens of terabytes of storage and getting the right data to each processor in time to keep the whole cluster processing as fast as it can.
They looked for a faster storage system and it found in the form of Panasas and its parallel file system offering simultaneous storage access to every compute node in the cluster.
The HPCS has deployed the Panasas ActivStore AS5000 product with the PanFS parallel file system. Panasas has a strong presence in the HPC and supercomputing markets; it is used, for example, in Roadrunner, currently the world's fastest suprcomputer.
Dr. Calleja said: “It is critical that our centralized service provides the ideal platform for software applications with different compute, memory and storage requirements. The Panasas PanFS parallel file system significantly boosts the performance of applications that take advantage of parallel I/O."
"Each node on our cluster runs the Panasas DirectFlow client software and has a physical data path to the storage giving us the flexibility to employ parallel I/O across all nodes for single jobs or many individual jobs performing I/O simultaneously. Our users are enjoying an increased job throughput following the deployment of the Panasas solution”.
Burke said: "Before Panasas was used there was a parallel file system for scratch I/O and NFS NAS (network-attached storage) for home directories, etc. The centre needed additional storage capacity and decided to go with Panasas. Now it doesn't need the parallel (scatch) filesystem as Panasas does both the scratch I/O and home directory storage (functions)."
Using the Panasas parallel I/O product has meant that the HPCS can fulfill its requirements for both performance and data availability with a single, dependable platform providing increased capacity for home directories as well as better application performance and a lower management burden. Dr Calleja said: “The reliability of the Panasas solution benefits users with increased uptime, and its manageability has allowed us to reduce our administrative overhead.”
By using the Panasas product HPCS users are experiencing a significant reduction in the time it takes to complete jobs, as well as increased system uptime. It allows researchers to complete their projects in less time and will contribute to Cambridge University’s position as a pre-eminent research and educational institution on the world stage.
[Chris Mellor.]
tags: HPC parallel
in Customer Stories
International law firm augments EMC with Plasmon
Qualcomm uses NetApp cluster for wireless chip work
you're reading:
Panasas selected for Darwin supercomputer at Cambridge University




