This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
running_jobs [2020/08/14 10:26] root |
running_jobs [2021/10/22 12:46] (current) root |
||
---|---|---|---|
Line 2: | Line 2: | ||
< | < | ||
- | You should submit all long-running, | + | You should submit all long-running, |
- | of the compute nodes. You can write and test your code, and make trial runs | + | You can write and test your code, and make trial runs of programs, format data, etc., |
- | of programs, format data, etc., on the head node, but please don't bog it down. | + | on the head node, but please don't bog it down. |
</ | </ | ||
Line 25: | Line 25: | ||
You can also submit an "array job". This submits the same job multiple times. You can use a SLURM-defined environment variable to distinguish between copies of the job from within the job script. | You can also submit an "array job". This submits the same job multiple times. You can use a SLURM-defined environment variable to distinguish between copies of the job from within the job script. | ||
- | You can run programs using **srun** but you need to take action to prevent your program stopping on " | + | You can run programs using **srun** but you need to take action to prevent your program stopping on " |
- | You can also run programs using **salloc**. This is interesting since it allows you to make an allocation of resources (nodes and cores) and then run programs within that allocation interactively using **srun**. It is subject to the same " | + | By default when you submit a job it will be allocated one core and 8GB of RAM on a compute node. This core and memory |
- | < | + | **Files:** ex1.bash |
- | salloc -n 5 | + | |
- | srun hostname | + | |
- | </ | + | |
- | By default when you submit a job it will be allocated one core on a compute node. This core is then unavailable to other cluster users. The cluster currently has 668 cores in total. | + | To alter your allocation you can use the **-c**, **-n**, **-N**, and **--exclusive** options (the long versions of these options are: --cpus-per-task, |
- | + | ||
- | Files: ex1.bash | + | |
- | + | ||
- | To alter your allocation you can use the **-c**, **-n**, **-N**, and **--exclusive** options (the long versions of these options are: --cpus-per-task, | + | |
---- | ---- | ||
Line 46: | Line 39: | ||
**cpus==cores** in our configuration. | **cpus==cores** in our configuration. | ||
- | This option sets how many cores each of your tasks will use. (By default you are running one task.) Since the nodes in the **standard** partition have a maximum of 24 cores you can't get an allocation of more than 24 cores per task (in that partition). The job would be rejected | + | This option sets how many cores each of your tasks will use. (By default you are running one task.) Since the nodes in the **standard** partition have a maximum of 32 cores you can't get an allocation of more than 32 cores per task (in that partition). The job would be rejected |
To get an entire node to yourself, regardless of the number of cores on the node, you can use the **--exclusive** option. | To get an entire node to yourself, regardless of the number of cores on the node, you can use the **--exclusive** option. | ||
- | You should use the **-c** option if you are running a job that allows you to specify how many threads to run (or cores to use). So, if you have an option on the program you are running to say "use 8 cores" you should also tell SLURM that your program is using 8 cores. If you don't the node you are running on may get a lot more threads | + | You should use the **-c** option if you are running a job that allows you to specify how many threads to run (or cores to use). So, if you have an option on the program you are running to say "use 8 cores" you should also tell SLURM that your program is using 8 cores. If you don' |
---- | ---- | ||
+ | |||
+ | === --exclusive === | ||
+ | |||
+ | Request all the cores on a node. This will set the allocation of cores on that node to however many cores there are in total on the node. Since you likely want all the memory on the node as well, you should likely specify " | ||
+ | |||
+ | ---- | ||
+ | |||
+ | === --mem === | ||
+ | |||
+ | Use the --mem option to request memory for your job. e.g. --mem=25G will request 25GB of RAM. You can, and should, request less than your default allocation of 8GB if you don't need 8GB. This frees up that memory for other users. | ||
+ | |||
+ | " | ||
=== -n, --ntasks === | === -n, --ntasks === | ||
- | This option is useful in very specific circumstances, | + | ** This option is useful in very specific circumstances, |
- | This option specifies the number of tasks you will be running (maximum). This can spread your allocation across multiple nodes: **-n 30** must use more than one node at one core per task. | + | This option specifies the number of tasks you will be running (maximum). This can spread your allocation across multiple nodes: **-n 40** must use more than one node at one core per task (in the standard partition). |
- | It behaves differently between sbatch and srun. For sbatch it will allocate the number of cores specified, possibly across multiple nodes, and start a single task on the first of those nodes. It is then up to your script to make use of the other allocated cores. For srun, it will cause one copy of your task to be run on each allocated core. | + | It behaves |
+ | |||
+ | For sbatch it will allocate the number of cores specified, possibly across multiple nodes, and run your script | ||
+ | |||
+ | For srun, it will cause one copy of your task to be run on each allocated core i.e. it will run multiple copies of the same task for the one srun command. | ||
---- | ---- | ||
Line 66: | Line 75: | ||
=== -N, --nodes === | === -N, --nodes === | ||
- | This option | + | ** This option |
- | ---- | + | This option specifies how many nodes you want your tasks to run on. It does not allocate whole nodes to you. Using **-N 2** with none of the other relevant options would give you 1 core on each of two different nodes. |
- | + | ||
- | === --exclusive === | + | |
- | Request a whole node to yourself. This will set the allocation of cores on that node to however | + | But, for sbatch, it will be up to your task to make sure that core is actually used. For srun, your task will be run as many times as nodes you specified: it will be up to you to make sure your tasks don't all do exactly |
---- | ---- |