User Tools

Site Tools


memory_use_monitoring

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
memory_use_monitoring [2020/08/14 16:25]
root
memory_use_monitoring [2021/02/26 12:48] (current)
root
Line 12: Line 12:
  
 MaxRSS is the maximum amount of memory your job used (it is given for a specific node if your job is running across multiple nodes - not common with the software generally in use on our cluster). MaxRSS is the maximum amount of memory your job used (it is given for a specific node if your job is running across multiple nodes - not common with the software generally in use on our cluster).
 +
 +The **sstat** command also works on the nodes themselves. So, to record the maximum amount of memory your job used, you could put the **sstat** command above as the last line of the script that you submit. You can get the job id required using the SLURM_JOB_ID environment variable.
 +
 +<code>
 +#!/bin/bash
 +
 +... Commands needed to do your processing here ...
 +
 +sstat --format=JobID,MaxRSS ${SLURM_JOB_ID}.batch
 +</code>
 +
 +The output of the sstat command will then be written into the **slurm-NNNN.out** output file of your job.
  
 Without the format option: Without the format option:
Line 21: Line 33:
 will give you a lot of details about your job including memory and CPU usage. will give you a lot of details about your job including memory and CPU usage.
  
 +==== For Completed Jobs ====
 +
 +You can look up details of jobs that you ran in the past using the **sacct** command. For instance:
 +
 +<code>
 +sacct -j 1031100 -o "JobID,JobName,MaxRSS,Elapsed"
 +</code>
 +
 +would report on job number 103110, giving the jobname, maximum memory used, and elapsed time that the job took to run. 
 +
 +You will likely have a record of your job numbers in the names of the slurm-NNNNNN.out files that record the output of your jobs.
  
 ==== From the Node ==== ==== From the Node ====
Line 31: Line 54:
     * Use the **--exclusive** option to get a whole node.     * Use the **--exclusive** option to get a whole node.
  
 +  * Use s**stat** as described above.
   * Run a background program to monitor memory use on that node.   * Run a background program to monitor memory use on that node.
  
memory_use_monitoring.1597436734.txt.gz · Last modified: 2020/08/14 16:25 by root