User Tools

Site Tools


etiquette

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
etiquette [2021/07/01 17:03]
root
etiquette [2022/01/26 10:26] (current)
root
Line 10: Line 10:
     * Compilation.     * Compilation.
     * ...     * ...
- +  Do not run long-running (multiple day) processes on the head node. 
-  Try not to bog a compute node down when other users' jobs are running on that node. +    * From time to time the head node will be rebooted to update the OS kernelA reboot of the head node will not affect jobs running on the compute nodes, but will, of course, kill off any processes running on the head node itself
-    * Using too much memory. +    * (This isn't really an etiquette issue, just warning. A required reboot of the head node will not be delayed for processes running on the head node.)
-    * Using way more cores than you said you would on the job submission.  +
-    * If in doubt take whole node (using **--exclusive**) and monitor progress (example later).+
  
   * Try not to queue hundreds of jobs that will take up the entire cluster.   * Try not to queue hundreds of jobs that will take up the entire cluster.
Line 26: Line 24:
     * If they take a really long time, it definitely isn't OK.     * If they take a really long time, it definitely isn't OK.
     * And there's a grey area in the middle.     * And there's a grey area in the middle.
-      * As a guideline, think twice before taking up more than 4 whole nodes for multiple days. +      * As a guideline, think twice before taking up more than 90 cores for a long time (multiple days)
       * Justification:       * Justification:
         * Usually there are 10-15 people running jobs on the cluster.         * Usually there are 10-15 people running jobs on the cluster.
-        * The cluster has 600 cores (roughly). +        * The cluster has 1000 cores (roughly). 
-        * So, the "fair share" per person is 40-60 cores+        * So, the "fair share" per person is 60-100 cores.
-        * 4 nodes is about 48 cores (of the standard nodes).+
  
-  * Don't expect to unzip (or zip) a large number of files more quickly by sending the unzip commands to multiple nodes. All that does is swamp the file server you are using slowing things down for you and everyone else.+  * Do not use the GPU and bigmem partitions (queues) unless you actually need those particular resources. 
 +    * It is OK to use the nodes in these partitions by submitting jobs to the short partition (which contains all nodes). The short partition has a 5 hour limit on job run time. 
 + 
 +  * Don't expect to unzip (or zip) a large number of files more quickly by sending the unzip commands to multiple nodes. All that does is swamp the file server you are usingslowing things down for you and everyone else.
     * It will probably be just as fast to run the unzip commands sequentially, or at most two at a time.     * It will probably be just as fast to run the unzip commands sequentially, or at most two at a time.
  
Line 42: Line 42:
  
   * On the new cluster you must specify how many cores you want to use, and how much memory your job needs.   * On the new cluster you must specify how many cores you want to use, and how much memory your job needs.
-    * Try not to over-specify i.e. don't ask for 20GB if your job only needs 5GB.+    * Try not to over-specify i.e. don't ask for 50GB if your job only needs 5GB
 + 
 +  * Use space on /scratch if you can. 
 +    * If you are downloading data from an external database (e.g. NCBI) that can just be downloaded again if necessary, then put it on /scratch. 
 +    * This reduces pressure on the amount of space on the other disk volumes, and on the backup system. 
 + 
 +  * Check whether there is sufficient free space on the disk volume on which you are working before downloading or generating large amounts of data. 
 +    * Filling up a disk volume will mean that your jobs will not complete properly (they will not be able to write output to disk), and may have the same effect on other users' jobs. 
 +    * You can use the "df" command for this e.g. df -H /home5.
  
   * Consider using the HPC Center BRC queue.   * Consider using the HPC Center BRC queue.
etiquette.1625173396.txt.gz · Last modified: 2021/07/01 17:03 by root