Differences

This shows you the differences between two versions of the page.

--- enforcing_memory_allocation [2021/06/29 14:43]
root created
+++ enforcing_memory_allocation [2022/01/26 10:04] (current)
root
@@ Line 1: / Line 1: @@
 ===== Enforcing Memory Allocation =====
-On the old cluster memory was not treated as a SLURM “consumable resource”. A job could use as much memory as it wanted, and this occasionally led to nodes being run out of memory.
+On the cluster, slurm is configured to treat memory as a "consumable resource". That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation **it will be killed by the operating system**.
-On the new cluster, memory is a SLURM consumable resource. That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation **it will be killed by the operating system**.
 You can change the amount of memory your job is allocated using either the “–mem” option, or the “–mem-per-cpu” option on sbatch.
-If you have previously run jobs on the old cluster, you can find out how much memory they used using the sacct command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNN.out file name) and then run a command like this:
+<code>
+sbatch --mem=20G my_script.bash
+</code>
-sacct -j NNNNN -o elapsed,MaxRSS
+or
+<code>
+sbatch -c 2 --mem-per-cpu=10G my_script.bash
+</code>
+You can request all memory on a node (whatever amount that is) using:
+<code>
+sbatch --mem=0 my_script.bash
+</code>
+If you have no idea how much memory your job will use, but are convinced that it will use more than the default amount (8GB per core allocated to the job), then you should run at least one test case using the **--exclusive** option and the "--mem=0" trick to **sbatch** which allocates you a whole node and all the memory on the node. When you job completes you can then use the **sacct** command to find out how much memory it used.
+If you have previously run jobs on the old cluster (before the 2021 update), you can find out how much memory they used by using the **sacct** command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNNN.out file name) and then run a command like this:
+<code>
+sacct -j NNNNNN -o elapsed,MaxRSS
+</code>
 Or you can use your user name and the job name (which defaults to the name of the script you submitted) and get a list of all jobs with the same name as follows. This particular command looks for jobs starting from Jan 1st 2020 (the default start time is the most recent midnight, so you likely will want to specify this option).
-sacct --user=chris -- -S 2020-01-01 --name=myscript.bash --format=JobID,AllocCPUS,elapsed,MaxRSS
+<code>
+sacct --user=chris -S 2020-01-01 --name=myscript.bash --format=JobID,AllocCPUS,elapsed,MaxRSS
+</code>
 The output of this command could be parsed to get the maximum amount of memory used by this particular type of job). (If you use the same script name for multiple different jobs, this will mix them up.)

BRC Cluster Workshop

User Tools

Site Tools

Differences

Page Tools