This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
enforcing_core_counts [2021/06/29 16:03] root |
enforcing_core_counts [2022/01/26 10:01] (current) root |
||
---|---|---|---|
Line 1: | Line 1: | ||
===== Enforcing Core Counts ===== | ===== Enforcing Core Counts ===== | ||
- | On the old cluster SLURM was configured so that it tracked allocated cores on each node, but did not limit a job from using more cores than specified in the submission i.e. a job could be submitted with the default 1-core allocation but run a program that started 10 threads and actually start using 10 cores. This occasionally led to nodes being bogged down. | + | Running |
- | + | ||
- | On the new cluster jobs will have access to exactly as many cores as requested in the job submission. If you submit a job requesting 1 core, and start a program that uses 10 threads, all 10 threads will be time-sliced on 1 core. Your job might run something like 10 times slower than you expected it to. | + | |
So, you should be more careful in specifying how many cores your job needs. See the " | So, you should be more careful in specifying how many cores your job needs. See the " | ||
- | If you suspect that a job you previously ran on the old cluster used more cores than were allocated to it, you might be able to get some idea of whether than is true using **sacct** to investigate the elapsed time that the job took to run compared to the total CPU time the job used. | + | Before the 2021 cluster update, the cluster did not enforce this core count restriction. |
< | < |