**PROTEUS **is a cluster of nodes with x86_64 Intel Xeon processors connected by a dual communications network (1GbE and Infiniband FDR). It also has 2 NVIDIA Tesla graphics cards for GPGPU.

As storage, it has a Ceph distributed file system for general storage and one based on Lustre for applications requiring higher bandwidth.

## PROCESSOR FAMILIES

The current composition of **PROTEUS** is the result of a series of extensions and improvements. The new nodes have been added to the existing ones, ensuring that they can coexist smoothly.

Currently, we have the following families of processors (labelled with the nicknames we have assigned to them):

Nick | Architecture (model) | #Nodes | #Cores/Node | RAM GB | Year |
---|---|---|---|---|---|

Artemis | Clovertown (E5345) | 10 | 8 | 4 | 2007 |

Calypso | Hapertown (E5410) | 51 | 8 | 8 | 2008 |

Kratos | Westmere (X5690) | 42 | 12 | 48 | 2012 |

Hermes v1 | Haswell (E5-2660 v3) | 13 | 20 | 64 | 2015 |

Hermes v2 | Broadwell (E5-2640 v4) | 4 | 20 | 64 | 2015 |

Metis v1 | Skylake (Gold 6132) | 8 | 28 | 96 | 2019 |

Metis v2 | Cascadelake (Gold 6226) | 40 | 24 | 96 | 2019 |

We also have two nodes with a larger amount of memory, for applications that need it:

Nick | Architecture (model) | #Nodes | #Cores/Node | RAM GB | Year |
---|---|---|---|---|---|

Hermes00 | Haswell (E5-2698 v3) | 1 | 32 | 256 | 2015 |

Metis00 | Cascadelake (Gold 6226) | 1 | 24 | 386 | 2019 |

## PERFORMANCE COMPARISON

These different families have very different performance. The same program can take different times to complete depending on the node it is running on. To help estimate execution time, a comparison of the computational power of the CPUs in each family has been made.

The computing power of a system is usually measured in FLOPS (Floating Point Operations per Second), i.e. the number of floating point operations per second, usually in double precision. This measure is independent of the CPU instruction set and is used to compare systems with different architectures.

The computing power of today’s processors allows rates of billions of operations per second, so the multiple GFLOPS is often used.

## THEORETICAL MAXIMUM POWER

We can calculate the theoretical computing power of a CPU if we know the details of its architecture. This value is known as the theoretical maximum or **Rpeak**. The simplest way to calculate it is to multiply the number of cores by their frequency and by the maximum number of double-precision floating-point instructions it is capable of performing per cycle. Thus, for a node, the expression would be:

Rpeak (GFLOPS) = #sockets · #cores/socket · GHz · #ops/ciclo

However, this is the theoretical upper limit and is very unlikely to be reached in real situations. The theoretical maximum helps us to know to what degree we are exploiting the CPU’s potential.

## PERFORMANCE TESTING (BENCHMARKS)

Performance tests are programmes created to test the characteristics of a computer system. They come in many types and can measure different aspects of the machine, such as the storage system, bandwidth to memory, graphics processing, etc. or the system as a whole.

In this scientific computing environment, the most relevant aspects are the computing power of the CPU and the bandwidth to main memory. To evaluate these components, two benchmarks widely used in HPC environments have been used: **Linpack and HPCG**.

Linpack is a library that solves linear algebra problems. It has been widely used as a CPU benchmark. **HPL** (High Performance Linpack) is a parallel and optimised version of it that is used to measure the performance of supercomputers that appear on the **TOP500** list.

**HPCG** (high performance conjugate gradient) is a new benchmark, also used on supercomputers, created with the intention of modelling the actual access that programs make to main memory. Memory accesses often act as a bottleneck, so this benchmark holds only a fraction of the CPU’s raw power. It complements Linpack, which is more processor intensive.

## PERFORMANCE

The following tables show the results obtained by calculating the theoretical maximum and evaluating the nodes using the Linpack and HPCG benchmarks.

Two evaluations have been performed: one using a single core and the other using all the cores contained in the node.

**Theoretical maximum – Rpeak (GFLOPS)**

Calculated with the above formula and taking into account the number of operations per cycle for each architecture (4, 16 or 32), we obtain the following Rpeaks:

Nick | Single core | Complete Node |
---|---|---|

Artemis | 9,32 | 74,56 |

Calypso | 9,32 | 74,56 |

Kratos | 13,88 | 166,56 |

Hermes v1 | 41,6 | 832 |

Hermes v2 | 38,4 | 768 |

Metis v1 | 83,2 | 2329,6 |

Metis v2 | 86,4 | 2073,6 |

**Linpack**

The version of HPL offered by Intel, which is specially optimised for its processors, has been used. The tests were performed using a single core and all available cores in the system. Also, for those CPUs that support TurboBoost, the results have been collected with this option enabled and disabled:

Nick | Single core | % | No Turbo | % | Complete Node | % |
---|---|---|---|---|---|---|

Artemis | 8,44 | 90,5 | – | – | 50,41 | 67,6 |

Calypso | 8,87 | 95,1 | – | – | 66.09 | 88,6 |

Kratos | 14,09 | 102,5 | 13,3 | 95,8 | 151,6 | 91 |

Hermes v1 | 42,68 | 109,7 | 38,99 | 93,7 | 671,2 | 80,6 |

Hermes v2 | ||||||

Metis v1 | ||||||

Metis v2 | 1534,51 | 74 |

**HPCG**

This table is under construction… Sorry for the inconvenience.

Nick | Single core | % | No Turbo | % | Complete Node | % |
---|---|---|---|---|---|---|

Artemis | – | – | ||||

Calypso | – | – | ||||

Kratos | ||||||

Hermes v1 | ||||||

Hermes v2 | ||||||

Metis v1 | ||||||

Metis v2 |