Info about performance

The performance, in computers terms, refers about the work realized by a computer and mainly may refers to:

  • response time
  • computing power
  • energy consumption

In scientific calculation context, the aspect that concern us is the computing power or processing speed. Due to exist a different CPU architectures, to be able to compare their calculation power, a measurement is used independently of intrinsic characteristics like clock speed.

This unit is the number of floating point operations, normally in double precision per time unit, generally, per second (FLOPS). The calculation power of the actual processors allow thousand of millions operations per second rates, usually using multiple GFLOPS.

Th PROTEUS nodes have various generation of Intel microprocessors.

Nickname CPU MHz #cores Cache #nodos
artemis E5345 2,33 4 8 11
calypso E5410 2,33 4 8 54
kratos X5690 3,46 6 12 42
hermes00 E5-2698 v3 2,3 16 40 1
hermesv3 E5-2660 v3 2,6 10 25 12
hermesv4 E5-2640 v4 2,4 10 25 4

For the investigator to have a reference of relative power between this generations, it has been done a performance tests.

PERFORMANCE TESTS

As already mentioned before, different CPUs cannot be compared only with some features, but depends of the processor architecture. Because of that, the last table don’t offer a lot of help. Knowing the architecture details (channel, number of functional units, etc.), it’s possible to know the theoretical computation capacity. This value is know as Rpeak. Although exist various forms for calculate it, we are going to use the most simple and extended formula that is multiply the number of processors or cores with its frequency and with the maximum number of double precision float point instructions able to do per cicle. For a one node the formula would be:

Rpeak = #sockets · #cores/socket · MHz · #ops/ciclo

However, this is the theoretical limit and it’s improbable to reach this situations because to achieve that, it would be needed a continuous data flow from memory (operands), that owing to latency and low clock frequency (bottleneck), despite the caches, it cannot be produced.

To evaluate the real system behaviour, performance tests or benchmarks are performed to empirically execute synthetic problems.

The most extended in HPC is Linpack. It consists in the factoring and resolution of a random dense system of equations (Ax=b) in double precision. With this test, which use an intensive floating point computing and not affect so much the memory bottleneck, it determines the capacity of the nodes calculating and serves as reference to compare different systems.

The next table shows the Linkpack test runned on different PROTEUS nodes. For each systems has been made two test: the first one has used one core and in the other test all cores in the node. Knowing this data, it can be possible to estimate the relative velocity for sequential and parallel programs.