User Tools

Site Tools


enforcing_core_counts

Enforcing Core Counts

Running cluster jobs have access to exactly as many cores as requested in the job submission. If you submit a job requesting 1 core, and start a program that uses 10 threads, all 10 threads will be time-sliced on that 1 core. Your job might run something like 10 times slower than you expected it to!

So, you should be more careful in specifying how many cores your job needs. See the “-c” or “–cpus-per-task” option to sbatch.

Before the 2021 cluster update, the cluster did not enforce this core count restriction. If you suspect that a job you previously ran on the old cluster used more cores than were allocated to it, you might be able to get some idea of whether than is true using sacct to investigate the elapsed time that the job took to run compared to the total CPU time the job used.

sacct --user=chris -S 2020-01-01 --format=JobID,JobName,AllocCPUS,elapsed,CPUTime,TotalCPU

If the TotalCPU time is much greater than the CPUTime it is likely that your job was using more cores than allocated. The number of cores it was actually using can be estimated by converting the two times into minutes or seconds and dividing TotalCPU by CPUTime. The CPUTime is equal to the number of allocated cores multiplied by the elapsed time. The TotalCPU time is the amount of CPU time used as measured by the OS (user and system CPU time).

Some programs “decide for themselves” how many threads/cores to use (e.g. Ropen with the MKL does this by default). So you may see your code unexpectedly running slower on the new cluster if you weren't paying attention to this.

enforcing_core_counts.txt · Last modified: 2022/01/26 10:01 by root