The University of Sheffield
Sheffield-WRGRID

iceberg HPC node

High Performance Computing server iceberg is the Sheffield node of the White Rose Computing Grid.

Hardware

Following the Autumn 2011 upgrade, iceberg now has the following hardware;

  • Two head-nodes ( for loging into iceberg ) 


  • AMD-based cluster  containing;
    • 96 nodes each with 4 cores and 16 GB of memory
    • 31 nodes each with 8 cores and 32 GB of memory
    Therefore TOTAL AMD CORES = 632,   TOTAL MEMORY = 2528 GB
    The 8-core nodes are connected to each other via 16 GBits/sec infiniband for MPI jobs
    The 4-core nodes are connected via the much slower "1 Gbits/sec" ethernet connections for MPI jobs
    Scratch space on each node is 400 GBytes


  • INTEL-based cluster containing;
    • 71 nodes each with 12 cores and 24 GB of memory ( i.e. 2 * 6-core Intel X5650 )
    • 5  nodes each with 12 cores and 48 GB of memory
    • 8  Nvidia Tesla Fermi M2070s GPU units for GPU programming
    Therefore TOTAL INTEL CPU CORES = 912  , TOTAL MEMORY = 1920 GB
    Scratch space on each node is 400 GB
    Each GPU unit contain 448 cores running at 1.15 GHz and 6GB of GDDR5 memory.
    Therefore total GPU cores = 3584,  Total GPU memory = 48 GB
    Each GPU unit is capable of about 1TFlop of single precision floating point performance, or 0.5TFlops at double precision. Hence yielding maximum GPU processing power of upto 8 TFlops in total.

Filestore

  • 45 TBytes NFS mounted filestore providing users with storage on /home and /data areas
  • 80 TBytes Infiniband connected parallel filestore providing storage on /fastdata area

Filestore Allocations

By default users get;  

  • 5 GBytes of storage on their /home area
  • 50 GBytes of storage on their /data area.
  • Currently we set no limits on the /fastdata area but the files that have not been modified for 3 months will get deleted. 

From time to time we shall review our data storage and allocation policies depending on usage and inform our users. 
It is strongly recommended that anyone wishing to use the /fastdata area creates a subdirectory with the same name as their username for storing their data. 
The /fastdata area is faster to access from the new (INTEL-based) nodes where the new infiniband connections are in use. 

Filestore recovery and backup policies

 If you do loose files by accidental deletion, over-writing etc. and you wish us to recover them, do let us know as soon as possible to increase the chances of successful recovery.

  • Users' /home areas are fully backed up to allow recovery of lost data.
  • /data and /fastdata areas are not backed up, however ...
  • Due to mirroring it will usually be possible to recover lost or deleted files from the /data areas, providing we are informed quickly after such an incident. 
  • It is not possible to recover lost and/or deleted files from the /fastdata areas. 

Software and Operating System

     Users normally log into a head node and then use one (or more) of the worker nodes to run their jobs on. Scheduling of users' jobs on the worker nodes are managed by the 'Sun Grid Engine' software.  Jobs can be run interactively ( qsh )  or submitted as batch jobs ( qsub).
  • The operating system is 64-bit Scientific Linux "which is Redhat based"  on all nodes
  • The Sun Grid Engine for batch and interactive job scheduling
  • Many Applications, Compilers, Libraries and Parallel Processing Tools. See Section on Software.

Iceberg: Summary of Facts & Figures

  • Processor cores:           1544 CPUs + 8 GPUs
  • Performance :                 ~15 TFLOPS
  • Total Main Memory:       4448 GB
  • Filestore:                          10 TB
  • Temporary Disk Space:  125 TB
  • Physical size:                   10 racks
  • Power consumption:        67 KW
iceberg-intel-cluster