User Tools

Site Tools


scheduling

This is an old revision of the document!


Scheduling

The cluster is running SLURM's scheduler with:

  • “Fair share” job priorities.
  • Simple round-robin node selection.

The “Fair Share” algorithm assigns a priority to submitted jobs which is inversely dependent on the amount of cluster CPU time consumed by the user submitting the jobs in recent days. These priorities apply only to jobs waiting in the queue, they do not affect jobs which are already running.

The scheduler does not check which nodes are busy and try to avoid them. This has an advantage in that it tends to leave some nodes empty for people who need a whole node.

Limiting the Number of Nodes Your Jobs Use

You can use the -w option to select a specific node. Actually it asks for “at least” the nodes in the node list you specify. So a command like:

sbatch -n 20 -w node2 my_script

Would get you some cores on node2 and some on another node (since there are only 16 cores total on node2). If there were no cores free on node2 the job would be queued until some became available.

Note that using the -w option with multiple nodes is not a way of queueing jobs on just those nodes: it will actually allocate cores across all nodes you specify and run the job on just the first on them. e.g.

sbatch -w node[2-4] my_script

would allocate one core on each of nodes 2,3, and 4 and run my_script on node2. You would then use srun from within your script to run job steps within this allocation of cores. Don't use this to limit the nodes you want your jobs to run on.

You can use the -x option to avoid specific nodes. A list of node names looks like this:

node[1-4,7,11]

Read as “nodes 1 to 4, 7 and 11” i.e. 1,2,3,4,7,11.

You can use -c 16 to request all cores on a (standard) node.

You can use the –exclusive option to ask for exclusive access to all the nodes your job is allocated. This is especially useful if you have a program which attempts to use all the cores it finds. Please only use it if you need it.

You can also use an array job to limit the number of sub-tasks running at any one time. This is a very good choice, and is highly recommended, if you are submitting a lot of jobs. An array job occupies just one slot in the “pending jobs” section of the output of squeue.

scheduling.1583958279.txt.gz · Last modified: 2020/03/11 16:24 by root