User Tools

Site Tools


sbatch

Table of Contents

sbatch

sbatch submits a script to be executed to the slurm controller and returns immediately. The script may be queued for running later if there are currently no resources (cores or memory) available to run it. The script is “detached” from your terminal, so even if you log out it will run, or continue to run.

The working directory for the script will be set to the current working directory when you submit the job.

By default, stdout and stderr of the script are sent to a file named slurm-%j.out where the “%j” is replaced by the SLURM job number. You can change this with the -o option, but you would likely still want to use the “%j” to prevent output from different jobs getting jumbled up.

sbatch -o gwas-test-%j.out run-gwas-test

You can also set sbatch options from within the script itself.

#!/bin/bash
#SBATCH -p bigmem
#SBATCH -o gwas-test-%j.out
gwas-program filename1 filename2

The “#SBATCH” lines should appear before any commands in the script.

sbatch will only run scripts (i.e. not object/executable programs). There is a –wrap option that will automatically wrap your program in a script but it seems a little awkward to use. Write your own.

ex1.bash

#!/bin/bash
hostname
echo Arg1 = $1
echo Arg2 = $2

sbatch ex1.bash one two

You can probably do everything you want to do with this sort of command.

ex2.bash

#!/bin/bash
#SBATCH -o ex2-%j.out
/bin/hostname
pwd
echo Arg1 is $1
echo Arg2 is $2

sbatch options

sbatch -N4 ex1.bash one two

Unless you intend to use srun from within your script, the -N option is probably not what you want.

The “-N4” option requests an allocation of 4 nodes for this job. sbatch only runs the script once. What it is doing is allocating 4 nodes and running the script on the first one. Tasks can be run on the allocated nodes by using srun from within your script.

This does not reserve all 4 nodes for you exclusively: just one core per node (unless you add some more options).

Array Jobs

sbatch --array 1-22 one_chromo.bash

ex3.bash

#!/bin/bash
echo Array job id = $SLURM_ARRAY_JOB_ID
echo Array task id = $SLURM_ARRAY_TASK_ID
sbatch --array 1-3 ex3.bash

This produces 3 output files by default: slurm_NNNN_1.out, slurm_NNNN_2.out, and slurm_NNNN_3.out, where NNNN is the SLURM job number.

You can also submit an “array job” like this:

sbatch --array="1-20%5" my_job_script.bash

This command will run your script 20 times (possibly on different nodes) with at most 5 copies running at any one time. (The “%5” part is optional.)

In the output of squeue you would see one line for the array job, and (in the example above) up to five lines for the currently running jobs.

You do have to do some work to make each job generated by the array do something different. There is an environment variable, SLURM_ARRAY_TASK_ID, that, for each job, will be set to the index number of that job (in the example this will run from 1 to 20). So, you could use this to select between 20 different files to be processed, for instance. (There is an example script below that will read a list of file names from a file and extract the line matching the index number of the job. There are plenty of other ways to do this too.)

This allows you to submit a large number of jobs quickly and easily and allows easy control of the resources you are using. The output from each job goes, by default, to a file named slurm-JJJJ_AAA.txt where JJJJ is the slurm job id, and AAA is the array index value.

#!/bin/bash

echo Job $SLURM_JOB_ID
echo Array index $SLURM_ARRAY_TASK_ID

# Read the list of files to be processed from array_files.txt.
mapfile -t files < array_files.txt
index=$(( SLURM_ARRAY_TASK_ID - 1 ))
echo File to be processed: ${files[$index]}

If you have submitted an array job and change your mind about how many of the array tasks should be running at any one time (perhaps the cluster gets less busy so it seems reasonable to increase the number of jobs you are running at one time), you can use scontrol to do that.

scontrol update JobId=NNNNNN ArrayTaskThrottle=10

You can cancel all of an array job in one go by using scancel on the job number of the array job entry. You can also cancel the individual tasks submitted by the array job.

To cancel the entire job (including all running tasks):

scancel NNNNNNN

To cancel an individual task from the array job (in this case task number 2):

scancel NNNNNNN_2

You can also cancel a range of the individual tasks within the array as follows. This will work whether the tasks are already running or not.

scancel NNNNNNN_[9-19]
sbatch.txt · Last modified: 2021/10/22 12:05 by root