Differences

This shows you the differences between two versions of the page.

--- enforcing_memory_allocation [2021/06/29 14:50]
root
+++ enforcing_memory_allocation [2022/01/26 10:04] (current)
root
@@ Line 1: / Line 1: @@
 ===== Enforcing Memory Allocation =====
-On the old cluster memory was not treated as a SLURM “consumable resource”. A job could use as much memory as it wanted, and this occasionally led to nodes being run out of memory.
+On the cluster, slurm is configured to treat memory as a "consumable resource". That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation **it will be killed by the operating system**.
-On the new cluster, memory is a SLURM consumable resource. That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation **it will be killed by the operating system**.
 You can change the amount of memory your job is allocated using either the “–mem” option, or the “–mem-per-cpu” option on sbatch.
@@ Line 17: / Line 15: @@
 </code>
-If you have previously run jobs on the old cluster, you can find out how much memory they used by using the **sacct** command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNNN.out file name) and then run a command like this:
+You can request all memory on a node (whatever amount that is) using:
+<code>
+sbatch --mem=0 my_script.bash
+</code>
+If you have no idea how much memory your job will use, but are convinced that it will use more than the default amount (8GB per core allocated to the job), then you should run at least one test case using the **--exclusive** option and the "--mem=0" trick to **sbatch** which allocates you a whole node and all the memory on the node. When you job completes you can then use the **sacct** command to find out how much memory it used.
+If you have previously run jobs on the old cluster (before the 2021 update), you can find out how much memory they used by using the **sacct** command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNNN.out file name) and then run a command like this:
 <code>
@@ Line 26: / Line 32: @@
 <code>
-sacct --user=chris -- -S 2020-01-01 --name=myscript.bash --format=JobID,AllocCPUS,elapsed,MaxRSS
+sacct --user=chris -S 2020-01-01 --name=myscript.bash --format=JobID,AllocCPUS,elapsed,MaxRSS
 </code>
 The output of this command could be parsed to get the maximum amount of memory used by this particular type of job). (If you use the same script name for multiple different jobs, this will mix them up.)
-If you have no idea how much memory your job will use, but are convinced that it will use more than the default amount (8GB per core allocated to the job), then you should run at least one test case using the **--exclusive** option to **sbatch** which allocates you a whole node. When you job completes you can then use the **sacct** command to find out how much memory it used.

BRC Cluster Workshop

User Tools

Site Tools

Differences

Page Tools