User Tools

Site Tools


enforcing_memory_allocation

This is an old revision of the document!


Enforcing Memory Allocation

On the old cluster memory was not treated as a SLURM “consumable resource”. A job could use as much memory as it wanted, and this occasionally led to nodes being run out of memory.

On the new cluster, memory is a SLURM consumable resource. That means that you should be careful to specify how much memory your job needs. By default your job will be allocated 8GB per core that it has requested. By default 1 core is allocated, in which case 8GB of memory would be allocated to your job. If your job attempts to use more than its memory allocation it will be killed by the operating system.

You can change the amount of memory your job is allocated using either the “–mem” option, or the “–mem-per-cpu” option on sbatch.

If you have previously run jobs on the old cluster, you can find out how much memory they used using the sacct command. You can either find the job id of a completed job on the old cluster (perhaps from the slurm-NNNNN.out file name) and then run a command like this:

sacct -j NNNNN -o elapsed,MaxRSS

Or you can use your user name and the job name (which defaults to the name of the script you submitted) and get a list of all jobs with the same name as follows. This particular command looks for jobs starting from Jan 1st 2020 (the default start time is the most recent midnight, so you likely will want to specify this option).

sacct –user=chris – -S 2020-01-01 –name=myscript.bash –format=JobID,AllocCPUS,elapsed,MaxRSS

The output of this command could be parsed to get the maximum amount of memory used by this particular type of job). (If you use the same script name for multiple different jobs, this will mix them up.)

enforcing_memory_allocation.1624992229.txt.gz · Last modified: 2021/06/29 14:43 by root