Compilador GNU/LLVM

The GNU family of compilers produce highly optimized code for Intel and AMD CPUs.  As the LLVM C and C++ compilers deliberately share the majority of their optimization flags with their GNU equivalents the information here applies to both sets of compilers.  As with all compilers, programs compiled with optimization should have their output double-checked for accuracy. If the numeric output is incorrect or lacks the desired accuracy less-aggressive compile options should be tried. The following table summarizes some relevant commands on the SCC for the GNU compilers:

Command Description
module avail gcc List available versions of the GNU compilers.
module load gcc/6.2.0 Load a particular version.
gcc GNU C compiler.
g++ GNU C++ compiler.
gfortran GNU Fortran 90/95/2003/etc compiler.
g77 GNU Fortran 77 compiler.

The LLVM compilers commands are summarized here:

Command Description
module avail llvm List available versions of the LLVM compilers.
module load llvm/4.0.0 Load a particular version.
clang LLVM C compiler.
clang++ LLVM C++ compiler.

Manuals are available for all of the compilers after their modules are loaded:

man g++
man gfortran

The GNU Compiler Collection has their optimization flags described in an online document.

Opciones de compilación generales

A continuación se listan los flags de optimización básicos. El uso de estos flags no incurre en incompatibilidad entre diferentes arquitecturas de CPU.

Flag Description
-O1 Compilación optimizada
-O2 Añade más optimizaciones. Este es el flag recomendado para la mayoría de códigos
-O3 Añade optimizaciones más agresivas que -O2, aunque el tiempo de compilación es mayor.  Recomendado para bucles con cálculo intensivo en coma flotante
-Ofast -O3 más algunos extras. La documentación de GNU señala que esta opción podría llevar a no cumplir “strict standards compliance”. Puede causar pérdida de exactitud en las operaciones matemáticas. No aconsejable para computación científica
-flto (solo GNU) Link-time optimization, a step that examines function calls between files when the program is linked. This flag must be used to compile and when linking. Compile times are very long with this flag, however depending on the application there may be appreciable performance improvements when combined with the -O* flags.  This flag and any optimization flags must be passed to the linker, and gcc/g++/gfortran should be called for linking instead of calling ld directly.
-mtune=processor Este flag añade This flag does additional tuning for specific processor types, however it does not generate extra SIMD instructions so there are no architecture compatibility issues. The tuning will involve optimizations for processor cache sizes, preferred ordering of instructions, and so on. The useful values for the value processor on the SCC Intel nodes are: intel, broadwell,haswell,ivybridge,sandybridge, or nehalem.  On the AMD nodes the value to use is: bdver1

 

Flags para especificar instrucciones SIMD

Estos flags autorizan el uso de instrucciones SIMD para obtener programas con mayor rendimiento pero que pierden la compatibilidad entre arquitecturas de CPU. Al usarlas, hay que tener esto siempre en cuenta.

Flag Description
-march=native Creates an executable that uses SIMD instructions based on the CPU that is compiling the code. Additionally it includes the optimizations from the -mtune=native flag.
-march=arch This will generate SIMD instructions for a particular architecture and apply the -mtune optimizations.  The useful values of arch are the same as for the -mtune flag above.
-msse4.2 Genera código con instrucciones SSE4.2. No compatible con los nodos artemis
-mavx Genera código con instrucciones AVX. No compatible con los nodos artemis, calypso, kratos
-mavx2 Generates code with AVX2 instructions. This requires an additional module to be loaded before compiling: module load binutils/2.28Code compiled with this flag will not be able to run on Nehalem, Sandybridge, Ivybridge, or the AMD Bulldozer cores.

Before using this option, load a newer version of the linker program that will properly handle AVX2 instructions:

module load binutils/2.28

Comportamiento de optimización por defecto

La mayoría de los programas de código abierto están configurados para compilarse con las flags -O2 or -O3, lo cual no presentará ningún problema de compatibilidad. En ocasiones, por defecto usan el flag -march=native , el cual puede dar problemas de compatibilidad. El nodo de entrada a PROTEUS es una máquina virtual que no se corresponde con ninguna arquitectura real. Sería necesario cambiar ese flag por uno más conveniente o recompilar en el nodo de computación antes de la ejecución.

Recommendations

Most codes will be well-optimized with the -O2 or -O3 flags plus the -msse4.2 flag. Programs that involve intensive floating-point calculations inside of loops can additionally be compiled with the -xarch flag.  For maximum cross-compatibility across the SCC compute nodes and probable highest performance a combination of flags should be used:

gcc -O3 -msse4.2 -mtune=intel -c mycode.cpp

Floating-point intensive code can benefit from the use of -mavx or -mavx2 instead of -msse4.2, depending on the compute node that will be used.

Note that selecting specific SIMD instructions with the -mavx* flag or -march=arch flag will restrict compatibility with compute nodes unless the job is submitted with this qsub flag: -l cpu_arch=compatible_arch. The compatible_arch value is an architecture name that matches the SIMD instructions.  Alternatively, the qsub flag -l cpu_arch=\!compatible_arch can be used to exclude an incompatible architecture:

gcc -O3 -march=haswell mycode.cpp -o mycode
qsub -l cpu_arch=haswell -b y mycode
# OR...as the -march=haswell has produced AVX instructions
# just exclude the Nehalem nodes
qsub -l cpu_arch=\!nehalem -b y mycode

Another option is to compile the code as part of a batch job which completely avoids any architectural issues and allows for the maximum amount of optimizations. For example, a job that is submitted to run on a Buy-in node equipped with an Ivybridge architecture CPU could be compiled with tunings for that node. As a precaution the source is copied into $TMPDIR: