User Tools

Site Tools


memory_use_monitoring

This is an old revision of the document!


Memory Use Monitoring

From the Command Line (on the head node)

From the head node you can use the sstat command to check on memory use (and other details) of your jobs.

sstat --format=JobID,MaxRSS 1234.batch

The “.batch” part of the command above is literal (i.e. it is not a placeholder for some other piece of information, actually type “.batch”). This selects the memory usage of the submitted slurm batch job. If your job uses job steps you can replace the word “batch” with the job-step number, or use the “–allsteps” option to get details for all job steps.

Without the format option:

sstat 1234.batch

will give you a lot of details about your job including memory and CPU usage.

Note: sstat currently only works on the main cluster, not on crystallineentity.

From the Node

Whole Node Monitoring

Suppose you have a program to run and you expect it to use a lot of memory, but are unsure of exactly how much.

  • Run it on a node reserved exclusively for you.
    • Use the –exclusive option to get a whole node.
  • Run a background program to monitor memory use on that node.

Files: memmon, memlog, leakmem

srun memmon

leakmem allocates 1MB of RAM every second (for up to 10000 seconds).

We use the SLURM_JOB_ID environment variable to get a unique name for the log file.

Single Program Monitoring

/proc/PID/smaps provides a detailed view of the memory being used by a specific process. You can parse it (or use some previously written Perl code, Linux::Smaps.pl).

Files: memmon2, memlog2, leakmem.

memory_use_monitoring.1508353832.txt.gz · Last modified: 2017/10/18 15:10 by root