User Tools

Site Tools


update_2021_overview

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
update_2021_overview [2021/06/07 18:17]
root
update_2021_overview [2021/12/14 17:18] (current)
root
Line 1: Line 1:
 ===== Update 2021 Overview ===== ===== Update 2021 Overview =====
  
-The cluster is being updated to Ubuntu 20.04. Along with the OS update comes several changes to the way the cluster works. This page summarizes those changes. Workshops will be held in June and July of 2021 to discuss these changes in detail. +The cluster is being updated to Ubuntu 20.04. Along with the OS update comes several changes to the way the cluster works. This page summarizes those changes. Workshops were held in June and July of 2021 to discuss these changes in detail. 
  
 Rather than shutting the cluster down and doing the upgrade in one swell foop, we are providing a new cluster (with a new login node) already upgraded to Ubuntu 20.04 "Focal Fossa". You can move to the new cluster when it is convenient (although you will have to move in the near future - defined as the next 3-4 months). Rather than shutting the cluster down and doing the upgrade in one swell foop, we are providing a new cluster (with a new login node) already upgraded to Ubuntu 20.04 "Focal Fossa". You can move to the new cluster when it is convenient (although you will have to move in the near future - defined as the next 3-4 months).
  
-The new cluster currently has 6 brand new compute nodes to which you can submit jobs using slurm as usual. These 6 nodes each have:+** From September 6th no new jobs will be allowed on the old cluster. Jobs which are already running on that date will be allowed to continue running, but the old cluster now has a 30-day time limit on all queues. ** 
 + 
 +Most of the old cluster nodes have now been moved to the new cluster. The new cluster includes 6 brand new compute nodes to which you can submit jobs using slurm as usual. These 6 nodes each have:
    
  
Line 13: Line 15:
 **There are some important differences in the slurm submission configuration described below.** **There are some important differences in the slurm submission configuration described below.**
  
-When you log in to the new cluster you will have access to your home directory exactly as on the current head node. Unfortunately that does not mean that all the software you have installed in your home directory will continue to work: the new OS has updated shared libraries that may or may not be compatible with the programs you have installed. So you will need to spend some time testing and/or re-installing the software you need. (See the paragraph below about scratch space that could be used for testing software updates.)+When you log in to the new cluster you will have access to your home directory exactly as on the current head node. **Unfortunately that does not mean that all the software you have installed in your home directory will continue to work**: the new OS has updated shared libraries that may or may not be compatible with the programs you have installed. So you will need to spend some time testing and/or re-installing the software you need. (See the paragraph below about scratch space that could be used for testing software updates.)
  
 The nodes on the old cluster will be slowly disappearing from that cluster and reappearing on the new cluster. The plan is to move one or two nodes per week, so with about 30 nodes to move, after a few months the old cluster will have no nodes left - hence the need for everyone to move. The nodes on the old cluster will be slowly disappearing from that cluster and reappearing on the new cluster. The plan is to move one or two nodes per week, so with about 30 nodes to move, after a few months the old cluster will have no nodes left - hence the need for everyone to move.
    
 Allison told me that we need scratch space on the cluster (i.e. disk space that is not backed up and is intended only for somewhat temporary files). To that end the new cluster has a 70TB volume mounted at /scratch, and each (active) account can have a sub-directory of /scratch for temporary download space, or runtime temporary files. Files on /scratch will be automatically deleted when they get to be 90 days old (this time limit may be changed, up or down, as we see how heavily the scratch space is used). I have created subdirectories of /scratch for all users who have been active on the old cluster in the last month. Any other user can send an e-mail to Kevin or me to request that we create a sub-directory for them. Allison told me that we need scratch space on the cluster (i.e. disk space that is not backed up and is intended only for somewhat temporary files). To that end the new cluster has a 70TB volume mounted at /scratch, and each (active) account can have a sub-directory of /scratch for temporary download space, or runtime temporary files. Files on /scratch will be automatically deleted when they get to be 90 days old (this time limit may be changed, up or down, as we see how heavily the scratch space is used). I have created subdirectories of /scratch for all users who have been active on the old cluster in the last month. Any other user can send an e-mail to Kevin or me to request that we create a sub-directory for them.
 +
 +For those of you working with sensitive data, the scratch space is encrypted "at rest". So you can use it too.
  
 I have not attempted to install all the software from the old cluster on the new because I would like to start fresh and install only what we need. I have installed the most obvious software: R, singularity, samtools, etc. Feel free to send e-mail requesting the installation of specific software. I have not attempted to install all the software from the old cluster on the new because I would like to start fresh and install only what we need. I have installed the most obvious software: R, singularity, samtools, etc. Feel free to send e-mail requesting the installation of specific software.
Line 32: Line 36:
  
 So, you should be more careful in specifying how many cores your job needs. See the "-c" or "--cpus-per-task" option to sbatch. So, you should be more careful in specifying how many cores your job needs. See the "-c" or "--cpus-per-task" option to sbatch.
 +
 +See [[Enforcing Core Counts]] for more details.
  
 === Enforcing Memory Allocation === === Enforcing Memory Allocation ===
Line 42: Line 48:
 You can change the amount of memory your job is allocated using either the "--mem" option, or the "--mem-per-cpu" option on sbatch. You can change the amount of memory your job is allocated using either the "--mem" option, or the "--mem-per-cpu" option on sbatch.
  
 +See [[Enforcing Memory Allocation]] for more details.
  
 ==== Environment Modules ==== ==== Environment Modules ====
  
-Environment modules allow you to control which software (and which version of that software) is available in your environment. For instance the new cluster has 4 different version of standard R installed: 3.5.3+Environment modules allow you to control which software (and which version of that software) is available in your environment. For instance the new cluster has 4 different versions of standard R installed: 3.5.3
 , 3.6.3, 4.0.5, 4.1.0. When you first log in and try to run R the OS will respond with "command not found". To activate R in your environment you would type: , 3.6.3, 4.0.5, 4.1.0. When you first log in and try to run R the OS will respond with "command not found". To activate R in your environment you would type:
  
Line 52: Line 59:
 </code> </code>
  
-That would then give you access to the most recent version of R available (4.1.0 in this case). +See [[Environment Modules]] for more details.
- +
-To use a different version you would have typed something like: +
- +
-<code> +
-module add R/3.6.3 +
-</code> +
- +
-To get a list of all available software you can type: +
- +
-<code> +
-module avail +
-</code> +
- +
-To get a full list of module commands: +
- +
-<code> +
-module --help +
-</code>+
  
 ==== Connecting to the Updated Cluster ==== ==== Connecting to the Updated Cluster ====
Line 78: Line 67:
   * captainsisko.statgen.ncsu.edu   * captainsisko.statgen.ncsu.edu
   * brccluster2021.statgen.ncsu.edu   * brccluster2021.statgen.ncsu.edu
 +
 +Use the same user name and password that you use for the old cluster.
  
update_2021_overview.1623104279.txt.gz · Last modified: 2021/06/07 18:17 by root