environment_modules

Environment Modules

Environment Modules

Environment modules allow you to control which software (and which version of that software) is available in your environment. For instance, at the time of writing, the cluster has 4 different version of standard R installed: 3.5.3, 3.6.3, 4.0.5, 4.1.0. When you first log in and try to run R the OS will respond with “command not found”. To activate R in your environment you would type:

module add R

That would then give you access to the most recent version of R available (4.1.0 in this case).

To use a specific version you would have typed something like:

module add R/3.6.3

To get a list of all available software you can type:

module avail

To get a full list of module commands:

module --help

There's a shorthand version of the module command: ml. To load a module you can use just:

ml fastp

to, for instance, load the fastp program.

You can also issue the other module commands using ml:

ml avail
ml list
ml purge
...

Just typing ml on its own is the same as ml list.

Whether using module or ml you can load multiple modules with a single command:

module add R/4.1.0 samtools bedtools

ml R/4.1.0 samtools bedtools

Where run Module Load Commands

You have several options as to where to use “module load” commands.

Put module load commands into your .bashrc or .bash_profile file
- Modules/programs loaded here will be available immediately upon login.
- This is a good choice for programs you use a lot from the command line. R might be a good example.
- The specific choices you make in your .bashrc file can be overridden in shell scripts that you run on the head node or submit to compute nodes.
Run module load commands as needed “by hand” (on the command line).
- This may be OK if you usually submit jobs by running scripts, and those scripts load modules themselves.
Put module load commands into scripts that you run on the head node to submit jobs to the compute nodes e.g. scripts that run sbatch commands.
- Doing this does not affect the modules you have loaded in your login shell, and any modules loaded within the script affects only commands run within the script.
Put the module load commands in the scripts that you submit to the compute nodes.
- This is subject to the same comment as technique 3.
Put the required module load commands into a text file and then “source” (bash command) the text file as necessary.
- This might be a good way to specify what programs/modules are needed for specific pipelines that you commonly run.
- You could have your scripts source the list of programs needed for the pipeline rather than explicitly listing module load commands in each script.

Module Conflicts

When you issue a module load command the modules program checks whether you already have a different version of the same program loaded as a module. If you do, it reports an error and does not load the module a second time.

Suppose you have the latest version of R loaded in your .bashrc file, but also have a pipeline that has been installed and thoroughly tested with a previous version of R, and you have in the scripts for that pipeline something like:

module load R/3.6.3

This would generate an error because of the latest version of R already being loaded.

So, in your script you could unload R before loading the new version.

module unload R
module load R/3.6.3

You might also consider unloading all modules, and loading only those you need, before starting up your pipeline:

module purge
module load R/3.6.3

Not using Modules

In general, the environment modules just edit your PATH variable (they usually prepend a directory to your PATH). The convention is that all programs loaded by “module load” can be found in a sub-directory of /opt. The naming convention is:

/opt/SOFTWARE/VERSION

Where SOFTWARE would be replaced by the name of the software, e.g. R, and VERSION would be replaced by a version number for that software, e.g. 3.6.3 (for R).

Usually (but not always) the executable programs will be in /opt/SOFTWARE/VERSION/bin.

You can find specifically what “module load” does for a piece of software by looking at the modulefile for that piece of software. It can be found at:

/opt/modules/modulefiles/SOFTWARE/VERSION

(This is a file - not a directory.)

If you prefer not to use modules, you can just update your PATH to include the directories of the pieces of software you want to use (possibly updating your PATH in .bashrc).

In most cases, modules is just a nice easy way of updating you PATH. So it seems preferable to use the module command rather than updating your PATH explicitly.

Special Purpose Modules

There are a couple of special purpose modules.

dot
- The dot module just puts your current working directory into your path allowing you to run a program or script that is in your cwd with just “programname” rather than “./programname”.
- This can lead to confusion if you have programs or scripts with the same name in different directories and forget where you are.

use.own
- This module allows you to use modulefiles of your own.
- Try “module help use.own” for more information.

Oddities and Exceptions

Python and Perl

Since various OS level tools need python and perl there are versions of these languages installed system-wide. No module needed. These are python3 (version 3.8.5) and perl (version 5.30.0). You are welcome to use these, but there are also modules with slightly different versions:

python 2.7.18 for older software that requires python2
python 3.9.5
perl 5.34.0

The system-wide python is accessible only as “python3”. When a python module is loaded just “python” will start up the relevant version of python.

Python and perl packages that users request will be installed into the module versions of these programs. You can install python and perl packages locally as you wish (using any of these versions).

More information about environment modules can be found here: https://modules.readthedocs.io/en/latest/

Installing R Packages

If you try to install an R package (as an ordinary user) and get a “permission denied” message like this:

* installing to library '/opt/R/4.1.0/lib/R/library'
Error: ERROR: no permission to install to directory '/opt/R/4.1.0/lib/R/library'

Then you might need to create the correct directory for R to use for package installation within your home directory. For R 4.1.0 this would be:

~/R/x86_64-pc-linux-gnu-library/4.1

Where “~” means “your home directory”. You should only specify the first two parts of the full version number (hence the 4.1 for version 4.1.0).

You can create the directory from the command line, like this:

cd
mkdir -p ~/R/x86_64-pc-linux-gnu-library/4.1

Or you can do it from within R, and then you won't need to know any details like the specific version number - the R program that you have started will fill them in for you:

dir.create(Sys.getenv("R_LIBS_USER"), recursive = TRUE)

Rscript and the "#!" Hack

If you have used

#!/usr/bin/Rscript

as the first line of your R scripts so that you can run them just like programs on the old cluster, they will no longer work on the new cluster. This is because there is no interpreter at /usr/bin/Rscript on the new cluster.

On the new cluster you should load an R module (possibly from within your .bashrc file so that R is always available when you log in), and then use:

#!/usr/bin/env Rscript

at the top of your R scripts.

Table of Contents