User Tools

Site Tools


enforcing_core_counts

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
enforcing_core_counts [2021/06/29 16:04]
root
enforcing_core_counts [2022/01/26 10:01] (current)
root
Line 1: Line 1:
 ===== Enforcing Core Counts ===== ===== Enforcing Core Counts =====
  
-**This page applies only to the new cluster.** +Running cluster jobs have access to exactly as many cores as requested in the job submission. If you submit a job requesting 1 core, and start a program that uses 10 threads, all 10 threads will be time-sliced on that 1 core. Your job might run something like 10 times slower than you expected it to!
- +
-On the old cluster SLURM was configured so that it tracked allocated cores on each node, but did not limit a job from using more cores than specified in the submission i.e. a job could be submitted with the default 1-core allocation but run a program that started 10 threads and actually start using 10 cores. This occasionally led to nodes being bogged down. +
- +
-On the new cluster jobs will have access to exactly as many cores as requested in the job submission. If you submit a job requesting 1 core, and start a program that uses 10 threads, all 10 threads will be time-sliced on 1 core. Your job might run something like 10 times slower than you expected it to.+
  
 So, you should be more careful in specifying how many cores your job needs. See the "-c" or "--cpus-per-task" option to sbatch. So, you should be more careful in specifying how many cores your job needs. See the "-c" or "--cpus-per-task" option to sbatch.
  
-If you suspect that a job you previously ran on the old cluster used more cores than were allocated to it, you might be able to get some idea of whether than is true using **sacct** to investigate the elapsed time that the job took to run compared to the total CPU time the job used.+Before the 2021 cluster update, the cluster did not enforce this core count restriction. If you suspect that a job you previously ran on the old cluster used more cores than were allocated to it, you might be able to get some idea of whether than is true using **sacct** to investigate the elapsed time that the job took to run compared to the total CPU time the job used.
  
 <code> <code>
enforcing_core_counts.1624997066.txt.gz · Last modified: 2021/06/29 16:04 by root