This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
memory_use_monitoring [2017/06/09 10:38] root |
memory_use_monitoring [2021/02/26 12:48] (current) root |
||
---|---|---|---|
Line 10: | Line 10: | ||
The " | The " | ||
+ | |||
+ | MaxRSS is the maximum amount of memory your job used (it is given for a specific node if your job is running across multiple nodes - not common with the software generally in use on our cluster). | ||
+ | |||
+ | The **sstat** command also works on the nodes themselves. So, to record the maximum amount of memory your job used, you could put the **sstat** command above as the last line of the script that you submit. You can get the job id required using the SLURM_JOB_ID environment variable. | ||
+ | |||
+ | < | ||
+ | #!/bin/bash | ||
+ | |||
+ | ... Commands needed to do your processing here ... | ||
+ | |||
+ | sstat --format=JobID, | ||
+ | </ | ||
+ | |||
+ | The output of the sstat command will then be written into the **slurm-NNNN.out** output file of your job. | ||
Without the format option: | Without the format option: | ||
Line 18: | Line 32: | ||
will give you a lot of details about your job including memory and CPU usage. | will give you a lot of details about your job including memory and CPU usage. | ||
+ | |||
+ | ==== For Completed Jobs ==== | ||
+ | |||
+ | You can look up details of jobs that you ran in the past using the **sacct** command. For instance: | ||
+ | |||
+ | < | ||
+ | sacct -j 1031100 -o " | ||
+ | </ | ||
+ | |||
+ | would report on job number 103110, giving the jobname, maximum memory used, and elapsed time that the job took to run. | ||
+ | |||
+ | You will likely have a record of your job numbers in the names of the slurm-NNNNNN.out files that record the output of your jobs. | ||
==== From the Node ==== | ==== From the Node ==== | ||
Line 28: | Line 54: | ||
* Use the **--exclusive** option to get a whole node. | * Use the **--exclusive** option to get a whole node. | ||
+ | * Use s**stat** as described above. | ||
* Run a background program to monitor memory use on that node. | * Run a background program to monitor memory use on that node. | ||