PROTEUS is a computational service offered by the Institute Carlos I for Theoretical and Computational Physics to its members and collaborators. It provides an optimized environment for intensive computation of general problems and incorporates the lastest trends in computing. It is made up of a high performance computing cluster with cloud computer services and cloud storage which serves more than 50 researchers from several countries, including Spain, Italy, Mexico and the United States.
The computing service is complemented by research and congress supporting, computer consulting, etc.
With Alhambra CPD, they place the UGR in the first positions of Spanish scientific supercomputation.
Specifications and data of interest
PROTEUS in numbers:
- Computing power:~27 Teraflops (27*1012 floating point operations per second)
- Principal memory:4 Terabytes (nodes with 256, 96, 64, 48, 16 y 8GB)
- Storage:80 Terabytes of shared space and 140TB for backups
- Execution cores: +1300 cores (in nodes of 8, 12, 20 y 32 núcleos, from 2,33GHz to 3,45GHz)
- Number of nodes: 134 nodes
- Communication network: Infiniband FDR for process communication, GigaEthernet for management and I/O with 10Gb trunks between switches and storage nodes
- Executed jobs since 2007: 1.800.000
- Average duration of this jobs: 35 days
- Number of users:50
- Ranking: among the first of Spain in scientific computing
PROTEUS, since its inception, has been well received and demanded, so a constant series of improvements and extensions has been necessary.
The supercomputing service in the iC1 was inaugurated in 1997. At that time, there were 24 processors and a power of 200 GFLOPs. The execution environment was based on MOSIX and the user accounts were shared by NFS.
In 2004 it was extended to 48 processors (500 GFLOPS).
It was in 2007 when there was a more radical improvement. In addition to a considerable increase in power (160 processors and 1500 GFLOPs), a computer engineer who made major changes to the system was added to the iC1: the Condor queue manager and the GlusterFS distributed file system began to be used.
In 2008, there was a new expansion to have 600 processors and 5500 GFLOPs. As improvements in the environment, new program restore points, data redundancy and secondary storage for backups.
In 2012 the power is extended to 1100 processors and 13000 GFLOPs. The innovations in the system are a better control over parallel and hungry-memory programs, private cloud storage and the incorporation of programmable graphic cards (GPGPU). The file system becomes based on CephFS.
The last expansion was in 2015. Again, it was enlarged in memory and processors. It was created a low latency network, Infiniband FDR, that connected the last nodes for execute distributed jobs with MPI.
In 2016 the management servers are reinforced so that the cluster is more robust against hardware failures, and can continue to function despite the fact that these cause the fall of some nodes, obtained through the virtualisation of the management nodes. The I / O backbone is enhanced with 10G connections. Backups are made on tapes. And the LUSTRE file system is created for high I/O ratio and parallel writes.