This command submits a job to run “in the background”. Output is written to a file which is named slurm-NNNNN.out by default, where NNNNN is the job number slurm assigns to your job.
sbatch [-p partition] [-c ncores] [--mem=NNNG] [--exclusive] scriptname
“–exclusive” requests all cores on a node. Use it only when you need to. “-c” specifies how many cores your job will use. (Use only one of “-c” and “–exclusive”.) “scriptname” must be the name of a shell script (but see the “–wrap” option.) “–mem=0” requests all memory on a node.
For submitting many jobs look at using the “–array” option.
sbatch --array=0-99%5 scriptname
Will run 100 copies of scriptname in total, but only allow 5 to be running at any one time. The script itself must sort out how to do something different for each instance (using the SLURM_ARRAY_TASK_ID environment variable).
On the new cluster you may need to use the “–mem” or “–mem-per-cpu” options to make sure that sufficient memory is allocated to your task. The default is 8GB per cpu/core.
squeue squeue -u USERNAME squeue -w NODENAME
sinfo sinfo -p standard -N -O "partition,nodelist,cpus,memory,cpusload" use_by_user
“use_by_user” is a script that runs “scontrol” to get the information it reports.
scancel JOBID scancel -u USERNAME
Not “skill” - which does exist, but isn't part of SLURM.
The second of these commands would kill all of your slurm jobs.
This works for running (or very recently completed) jobs.
scontrol show job JOBID
This command will show how much (elapsed) time and memory a job used. This information is kept in the slurm accounting database so the command can be used long after the job has completed.
sacct -o elapsed,maxrss -j NNNN.batch
Check on resources used by all of your jobs since a specific date:
sacct --user=chris -S 2020-01-01 -o elapsed,maxrss
To get information about a node including how many cores and how much memory it has:
scontrol show node node62
srun [-p partition] [-c ncores] [--exclusive] program srun --pty bash -i
The second command above will get you a command line on a node. You can use the “-w” option to target a specific node. (Note that you will only get the command line if there is a free core on the node in question.) You could use this to check on your job's status - e.g. amount of memory it is using, number of cores it is using. This can also be done more programmatically in your scripts, or using sstat (for memory use), but this command line technique can be useful sometimes.
As a matter of etiquette, please don't start up a shell on a node and just leave it running when you aren't using it. This tends to reduce the number of nodes available for exclusive use for users who need one.
Check how much space is left on your home volume:
chris@node0:~$ cd chris@node0:~$ pwd /home3/chris chris@node0:~$ df -H /home3 Filesystem Size Used Avail Use% Mounted on fs2:/srv/storage_2/node-home3 105T 91T 14T 88% /home3 chris@node0:~$
You should check this before you add a lot more data to your home directory, or run jobs that generate a lot of output. If you need more space than is available on your home volume please talk to the system administrators: we may be able to give you space on a different volume. Consider using /scratch for data that can easily be replaced (e.g. data download from NCBI).
See space remaining on all home volumes:
chris@node0:~$ df -H /home* Filesystem Size Used Avail Use% Mounted on fs1:/srv/storage_1/node-home 40T 33T 7.5T 82% /home fs2:/srv/storage_1/node-home 105T 96T 8.6T 92% /home2 fs2:/srv/storage_2/node-home3 105T 91T 14T 88% /home3 fs3:/srv/storage_1/node-home 81T 51T 30T 63% /home4 fs4:/srv/storage_0/node-home5 118T 39T 79T 33% /home5 fs4:/srv/storage_1/node-home6 98T 0 98T 0% /home6
To check how much disk space a directory is using:
chris@node0:~$ du -sh torch 3.6G torch
(This can take a long time if there are many files in the directory.)