This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
submitting_many_jobs [2020/03/11 17:17] root |
submitting_many_jobs [2021/02/20 09:51] (current) root |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Submitting Many Jobs ====== | ====== Submitting Many Jobs ====== | ||
+ | |||
+ | You may need to run many (possibly similar) tasks on the cluster. You can do this by submitting each job separately to the cluster, techniques for doing that are discussed below. You could also do it by submitting a small number of jobs (possibly even just one) and have those jobs execute the many tasks you need to run. You should try not to submit thousands of separate jobs into the queue. | ||
To submit many jobs to the cluster you can: | To submit many jobs to the cluster you can: | ||
Line 18: | Line 20: | ||
* Use the **-n** option on **sbatch** and **srun** within the script to start multiple copies of a program. | * Use the **-n** option on **sbatch** and **srun** within the script to start multiple copies of a program. | ||
- | * You can't exceed the number of cores available with this method (the job will be rejected). | + | * You can't exceed the number of cores available with this method (the job will be rejected). |
* You can get each of your tasks to do something a little different (e.g. processing a different file) by using the SLURM_PROCID environment variable. | * You can get each of your tasks to do something a little different (e.g. processing a different file) by using the SLURM_PROCID environment variable. | ||
* The " | * The " | ||
Line 37: | Line 39: | ||
< | < | ||
- | sbatch -n 2 run_multi_job_o | + | sbatch -n 2 run_multi_job_%t |
</ | </ | ||
Line 49: | Line 51: | ||
Some other options for limiting the number of cores your jobs use are described below. | Some other options for limiting the number of cores your jobs use are described below. | ||
- | === Sending a Job to a Specific Node === | + | ==== Singleton Jobs ==== |
+ | |||
+ | You can use the **--dependency** option of sbatch to make slurm run just one of your jobs at a time. Suppose you submit 10 jobs with the same job name, using the **--dependency=singleton** option will make slurm run these jobs one at a time. | ||
+ | |||
+ | < | ||
+ | for i in $(seq 1 10); do | ||
+ | sbatch --job-name oneatatime --dependency=singleton my_script.bash file_${i}.fasta | ||
+ | done | ||
+ | </ | ||
+ | |||
+ | ==== Sending a Job to a Specific Node ==== | ||
You can use the **-w** option to select a specific node. Actually it asks for "at least" the nodes in the node list you specify. So a command like: | You can use the **-w** option to select a specific node. Actually it asks for "at least" the nodes in the node list you specify. So a command like: | ||
Line 67: | Line 79: | ||
would allocate one core on each of nodes 2,3, and 4 and run my_script on node2. You would then use srun from within your script to run job steps within this allocation of cores. Don't use this to limit the nodes you want your jobs to run on. | would allocate one core on each of nodes 2,3, and 4 and run my_script on node2. You would then use srun from within your script to run job steps within this allocation of cores. Don't use this to limit the nodes you want your jobs to run on. | ||
- | === Excluding Some Nodes === | + | ==== Excluding Some Nodes ==== |
You can use the **-x** option to avoid specific nodes. A list of node names looks like this: | You can use the **-x** option to avoid specific nodes. A list of node names looks like this: | ||
Line 81: | Line 93: | ||
You can use the **--exclusive** option to ask for exclusive access to all the nodes your job is allocated. This is especially useful if you have a program which attempts to use all the cores it finds. Please only use it if you need it. | You can use the **--exclusive** option to ask for exclusive access to all the nodes your job is allocated. This is especially useful if you have a program which attempts to use all the cores it finds. Please only use it if you need it. | ||
- | === Using Job Steps === | + | ==== Using Job Steps ==== |
+ | |||
+ | Files: test-multi.sh, | ||
If you use **srun** within an **sbatch** script, the cores to be used for the jobs being srun can be sub-allocated from the cores alloted to the sbatch script. For example, the following script, test-multi.sh, | If you use **srun** within an **sbatch** script, the cores to be used for the jobs being srun can be sub-allocated from the cores alloted to the sbatch script. For example, the following script, test-multi.sh, | ||
Line 106: | Line 120: | ||
</ | </ | ||
- | The "&" | + | The "--ntasks 1" |
- | Similarly the " | + | The " |
+ | |||
+ | The "&" | ||
+ | |||
+ | Similarly the " | ||
Some example code for pause.sh could be as follows. | Some example code for pause.sh could be as follows. | ||
Line 157: | Line 175: | ||
So the " | So the " | ||
- | This technique also works " | + | This technique also works " |
< | < | ||
srun: Warning: can't run 1 processes on 3 nodes, setting nnodes to 1 | srun: Warning: can't run 1 processes on 3 nodes, setting nnodes to 1 | ||
</ | </ | ||
- | |||