This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
cheat_sheet [2020/08/26 12:42] root |
cheat_sheet [2022/01/26 10:20] (current) root |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== SLURM Cheat Sheet ===== | + | ===== Cheat Sheet ===== |
=== Submit a job === | === Submit a job === | ||
Line 6: | Line 6: | ||
< | < | ||
- | sbatch [-p partition] [-c ncores] [--exclusive] scriptname | + | sbatch [-p partition] [-c ncores] [--mem=NNNG] [--exclusive] scriptname |
</ | </ | ||
- | " | + | " |
+ | |||
+ | For submitting many jobs look at using the " | ||
+ | |||
+ | < | ||
+ | sbatch --array=0-99%5 scriptname | ||
+ | </ | ||
+ | |||
+ | Will run 100 copies of **scriptname** in total, but only allow 5 to be running at any one time. The script itself must sort out how to do something different for each instance (using the SLURM_ARRAY_TASK_ID environment variable). | ||
+ | |||
+ | On the new cluster you may need to use the " | ||
=== Check the queue === | === Check the queue === | ||
< | < | ||
+ | squeue | ||
squeue -u USERNAME | squeue -u USERNAME | ||
squeue -w NODENAME | squeue -w NODENAME | ||
Line 32: | Line 43: | ||
< | < | ||
scancel JOBID | scancel JOBID | ||
+ | scancel -u USERNAME | ||
</ | </ | ||
**Not " | **Not " | ||
+ | |||
+ | The second of these commands would kill **all** of your slurm jobs. | ||
=== Report Job Details === | === Report Job Details === | ||
+ | |||
+ | This works for running (or very recently completed) jobs. | ||
< | < | ||
scontrol show job JOBID | scontrol show job JOBID | ||
+ | </ | ||
+ | |||
+ | === Check on the Resources a Job Used === | ||
+ | |||
+ | This command will show how much (elapsed) time and memory a job used. This information is kept in the slurm accounting database so the command can be used long after the job has completed. | ||
+ | |||
+ | < | ||
+ | sacct -o elapsed, | ||
+ | </ | ||
+ | |||
+ | Check on resources used by all of your jobs since a specific date: | ||
+ | |||
+ | < | ||
+ | sacct --user=chris -S 2020-01-01 -o elapsed, | ||
</ | </ | ||
Line 58: | Line 88: | ||
The second command above will get you a command line on a node. You can use the " | The second command above will get you a command line on a node. You can use the " | ||
+ | |||
+ | As a matter of etiquette, please don't start up a shell on a node and just leave it running when you aren't using it. This tends to reduce the number of nodes available for exclusive use for users who need one. | ||
=== Checking Disk Space === | === Checking Disk Space === | ||
Line 73: | Line 105: | ||
</ | </ | ||
- | You should check this before you add a lot more data to your home directory. If you need more space than is available on your home volume please talk to the system administrators: | + | **You should check this before you add a lot more data to your home directory, or run jobs that generate a lot of output.** If you need more space than is available on your home volume please talk to the system administrators: |
See space remaining on all home volumes: | See space remaining on all home volumes: |