User Tools

Site Tools


submitting_many_jobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
submitting_many_jobs [2020/03/11 17:19]
root
submitting_many_jobs [2021/02/20 09:51] (current)
root
Line 1: Line 1:
 ====== Submitting Many Jobs ====== ====== Submitting Many Jobs ======
 +
 +You may need to run many (possibly similar) tasks on the cluster. You can do this by submitting each job separately to the cluster, techniques for doing that are discussed below. You could also do it by submitting a small number of jobs (possibly even just one) and have those jobs execute the many tasks you need to run. You should try not to submit thousands of separate jobs into the queue.
  
 To submit many jobs to the cluster you can: To submit many jobs to the cluster you can:
Line 18: Line 20:
  
   * Use the **-n** option on **sbatch** and **srun** within the script to start multiple copies of a program.   * Use the **-n** option on **sbatch** and **srun** within the script to start multiple copies of a program.
-    * You can't exceed the number of cores available with this method (the job will be rejected). (But see the **overcommit** option.)+    * You can't exceed the number of cores available with this method (the job will be rejected).
     * You can get each of your tasks to do something a little different (e.g. processing a different file) by using the SLURM_PROCID environment variable.     * You can get each of your tasks to do something a little different (e.g. processing a different file) by using the SLURM_PROCID environment variable.
     * The "-l" option on srun will label output lines with the task number.     * The "-l" option on srun will label output lines with the task number.
Line 37: Line 39:
  
 <code> <code>
-sbatch -n 2 run_multi_job_o+sbatch -n 2 run_multi_job_%t
 </code> </code>
  
Line 48: Line 50:
  
 Some other options for limiting the number of cores your jobs use are described below. Some other options for limiting the number of cores your jobs use are described below.
 +
 +==== Singleton Jobs ====
 +
 +You can use the **--dependency** option of sbatch to make slurm run just one of your jobs at a time. Suppose you submit 10 jobs with the same job name, using the **--dependency=singleton** option will make slurm run these jobs one at a time.
 +
 +<code>
 +for i in $(seq 1 10); do
 +sbatch --job-name oneatatime --dependency=singleton my_script.bash file_${i}.fasta
 +done
 +</code>
  
 ==== Sending a Job to a Specific Node ==== ==== Sending a Job to a Specific Node ====
Line 82: Line 94:
  
 ==== Using Job Steps ==== ==== Using Job Steps ====
 +
 +Files: test-multi.sh, pause.sh
  
 If you use **srun** within an **sbatch** script, the cores to be used for the jobs being srun can be sub-allocated from the cores alloted to the sbatch script. For example, the following script, test-multi.sh, that is to be submitted using **sbatch** specifies 3 tasks (defaulted to one core each). If you use **srun** within an **sbatch** script, the cores to be used for the jobs being srun can be sub-allocated from the cores alloted to the sbatch script. For example, the following script, test-multi.sh, that is to be submitted using **sbatch** specifies 3 tasks (defaulted to one core each).
Line 105: Line 119:
 of as providing a mechanism for resource management to the job within it's allocation. of as providing a mechanism for resource management to the job within it's allocation.
 </code> </code>
 +
 +The "--ntasks 1" on each srun command is important because without it srun will start the pause.sh script on each of the cores allocated to the sbatch command (3 in this case).
 +
 +The "--exclusive" on each srun command is important as discussed above.
  
 The "&" at the end of the srun line is important: else each srun will block, causing the srun steps to be executed consecutively. The "&" at the end of the srun line is important: else each srun will block, causing the srun steps to be executed consecutively.
Line 157: Line 175:
 So the "--ntasks 3" sbatch option, and the "--ntasks 1 --exclusive" on the srun command limited the number of processes running at any one time to 3. So the "--ntasks 3" sbatch option, and the "--ntasks 1 --exclusive" on the srun command limited the number of processes running at any one time to 3.
  
-This technique also works "across nodes", i.e. if I specify "--ntasks 50" as an sbatch option I will get job steps run on multiple nodes (because the nodes do have fewer than 50 cores each). In this case you will see messages from slurm saying:+This technique also works "across nodes", i.e. if I specify "--ntasks 50" as an sbatch option I will get job steps run on multiple nodes (because the nodes have fewer than 50 cores each). In this case you will see messages from slurm saying:
  
 <code> <code>
submitting_many_jobs.1583961592.txt.gz ยท Last modified: 2020/03/11 17:19 by root