User Tools

Site Tools


sbatch

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
sbatch [2017/03/29 18:50]
root
sbatch [2021/10/22 12:05] (current)
root
Line 1: Line 1:
 ====== sbatch ====== ====== sbatch ======
  
-**sbatch** submits a script to be executed to the slurm controller and returns immediately. The script may be queued for running later if there are currently no resources (cores) available to run it. The script is "detached" from your terminal, so even if you log out it will run, or continue to run.+**sbatch** submits a script to be executed to the slurm controller and returns immediately. The script may be queued for running later if there are currently no resources (cores or memory) available to run it. The script is "detached" from your terminal, so even if you log out it will run, or continue to run.
  
 The working directory for the script will be set to the current working directory when you submit the job. The working directory for the script will be set to the current working directory when you submit the job.
Line 15: Line 15:
 <code> <code>
 #!/bin/bash #!/bin/bash
 +#SBATCH -p bigmem
 #SBATCH -o gwas-test-%j.out #SBATCH -o gwas-test-%j.out
 gwas-program filename1 filename2 gwas-program filename1 filename2
Line 49: Line 50:
  
 sbatch -N4 ex1.bash one two sbatch -N4 ex1.bash one two
 +
 +** Unless you intend to use srun from within your script, the -N option is probably not what you want. **
  
 The "-N4" option requests an allocation of 4 nodes for this job. **sbatch** only runs the script once. What it is doing is allocating 4 nodes and running the script on the first one. Tasks can be run on the allocated nodes by using srun from within your script. The "-N4" option requests an allocation of 4 nodes for this job. **sbatch** only runs the script once. What it is doing is allocating 4 nodes and running the script on the first one. Tasks can be run on the allocated nodes by using srun from within your script.
  
-This does not reserve all 4 nodes for you exclusively: just one core per node.+This does not reserve all 4 nodes for you exclusively: just one core per node (unless you add some more options).
  
 ==== Array Jobs ==== ==== Array Jobs ====
Line 74: Line 77:
 This produces 3 output files by default: slurm_NNNN_1.out, slurm_NNNN_2.out, and slurm_NNNN_3.out, where NNNN is the SLURM job number. This produces 3 output files by default: slurm_NNNN_1.out, slurm_NNNN_2.out, and slurm_NNNN_3.out, where NNNN is the SLURM job number.
  
 +You can also submit an "array job" like this:
  
 +  sbatch --array="1-20%5" my_job_script.bash
 +
 +This command will run your script 20 times (possibly on different nodes) with at most 5 copies running at any one time. (The "%5" part is optional.)
 +
 +In the output of squeue you would see one line for the array job, and (in the example above) up to five lines for the currently running jobs.
 +
 +You do have to do some work to make each job generated by the array do something different. There is an environment variable, SLURM_ARRAY_TASK_ID, that, for each job, will be set to the index number of that job (in the example this will run from 1 to 20). So, you could use this to select between 20 different files to be processed, for instance. (There is an example script below that will read a list of file names from a file and extract the line matching the index number of the job. There are plenty of other ways to do this too.)
 +
 +This allows you to submit a large number of jobs quickly and easily and allows easy control of the resources you are using. The output from each job goes, by default, to a file named slurm-JJJJ_AAA.txt where JJJJ is the slurm job id, and AAA is the array index value.
 +
 +<code>
 +#!/bin/bash
 +
 +echo Job $SLURM_JOB_ID
 +echo Array index $SLURM_ARRAY_TASK_ID
 +
 +# Read the list of files to be processed from array_files.txt.
 +mapfile -t files < array_files.txt
 +index=$(( SLURM_ARRAY_TASK_ID - 1 ))
 +echo File to be processed: ${files[$index]}
 +</code>
 +
 +If you have submitted an array job and change your mind about how many of the array tasks should be running at any one time (perhaps the cluster gets less busy so it seems reasonable to increase the number of jobs you are running at one time), you can use scontrol to do that.
 +
 +<code>
 +scontrol update JobId=NNNNNN ArrayTaskThrottle=10
 +</code>
 +
 +You can cancel all of an array job in one go by using scancel on the job number of the array job entry. You can also cancel the individual tasks submitted by the array job.
 +
 +To cancel the entire job (including all running tasks):
 +
 +<code>
 +scancel NNNNNNN
 +</code>
 +
 +To cancel an individual task from the array job (in this case task number 2):
 +
 +<code>
 +scancel NNNNNNN_2
 +</code>
 +
 +You can also cancel a range of the individual tasks within the array as follows. This will work whether the tasks are already running or not.
 +
 +<code>
 +scancel NNNNNNN_[9-19]
 +</code>
  
sbatch.1490827820.txt.gz ยท Last modified: 2017/03/29 18:50 by root