Node Performance

The standard compute nodes have 2 Intel Xeon E5-2670 processors each, and 128GB of RAM. Each of the processors has 8 cores. (166 GFLOPS per processor peak. So we have about 8 TFLOPS across all the standard nodes.)

But, if you put a memory access intensive job on all cores they will likely run slower than you expect: the processors have 4 memory channels each, so with all 8 cores in use the jobs may start competing for memory access. If, on the other hand, the jobs spend some time waiting for external data access (disk reads) to complete, there may be no overall slow-down.

So if you have very memory intensive jobs (e.g. large matrix manipulation) you might want to say that each task uses 2 cores (even if it doesn't).

The nodes use a non-uniform memory architecture. Essentially 64GB of RAM are directly accessible from each processor. If your application uses more than 64GB of RAM, memory access speeds may be reduced.

http://www.cpubenchmark.net/high_end_cpus.html