User Tools

Site Tools


scheduling

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
scheduling [2020/03/11 16:30]
root
scheduling [2020/03/11 17:13] (current)
root
Line 9: Line 9:
  
 The scheduler does not check which nodes are busy and try to avoid them. This has an advantage in that it tends to leave some nodes empty for people who need a whole node. The scheduler does not check which nodes are busy and try to avoid them. This has an advantage in that it tends to leave some nodes empty for people who need a whole node.
- 
-==== Limiting the Number of Nodes Your Jobs Use ==== 
- 
-If you submit many jobs to the cluster (perhaps using individual **sbatch** commands) they will use any of the nodes/cores available on the cluster (in the partition to which you submitted your job). If you submit, say, 1000 jobs you may well fill all available cores on the cluster and leave no resources for other users. So it is a good idea to limit the number of jobs that are running at any one time. An **array job** is an ideal way of doing this: it makes only a single entry in the job queue (rather than 1000 in our example) and it lets you specify the maximum number that should be run at any one time.  
- 
-Array jobs are described in more detail on the [[sbatch]] page. 
- 
-There are some other options that may be useful described below. 
- 
-=== Sending a Job to a Specific Node === 
- 
-You can use the **-w** option to select a specific node. Actually it asks for "at least" the nodes in the node list you specify. So a command like: 
- 
-<code> 
-sbatch -n 20 -w node2 my_script 
-</code> 
- 
-Would get you some cores on node2 and some on another node (since there are only 16 cores total on node2). If there were no cores free on node2 the job would be queued until some became available. 
- 
-Note that using the -w option with multiple nodes is not a way of queueing jobs on just those nodes: it will actually allocate cores across all nodes you specify and run the job on just the first on them. e.g. 
- 
-<code> 
-sbatch -w node[2-4] my_script 
-</code> 
- 
-would allocate one core on each of nodes 2,3, and 4 and run my_script on node2. You would then use srun from within your script to run job steps within this allocation of cores. Don't use this to limit the nodes you want your jobs to run on. 
- 
-=== Excluding Some Nodes === 
- 
-You can use the **-x** option to avoid specific nodes. A list of node names looks like this: 
- 
-<code> 
-node[1-4,7,11] 
-</code> 
- 
-Read as "nodes 1 to 4, 7 and 11" i.e. 1,2,3,4,7,11. 
- 
-You can use **-c 16** to request all cores on a (standard) node. 
- 
-You can use the **--exclusive** option to ask for exclusive access to all the nodes your job is allocated. This is especially useful if you have a program which attempts to use all the cores it finds. Please only use it if you need it. 
- 
-You can also use an array job to limit the number of sub-tasks running at any one time. This is a very good choice, and is highly recommended,  if you are submitting a lot of jobs. An array job occupies just one slot in the "pending jobs" section of the output of **squeue**. 
- 
  
scheduling.1583958639.txt.gz · Last modified: 2020/03/11 16:30 by root