This is an old revision of the document!
This page applies only to the new cluster.
Environment modules allow you to control which software (and which version of that software) is available in your environment. For instance the new cluster has 4 different version of standard R installed: 3.5.3 , 3.6.3, 4.0.5, 4.1.0. When you first log in and try to run R the OS will respond with “command not found”. To activate R in your environment you would type:
module add R
That would then give you access to the most recent version of R available (4.1.0 in this case).
To use a specific version you would have typed something like:
module add R/3.6.3
To get a list of all available software you can type:
module avail
To get a full list of module commands:
module --help
There's a shorthand version of the module command: ml. To load a module you can use just:
ml fastp
to, for instance, load the fastp program.
You can also issue the other module commands using ml:
ml avail ml list ml purge ...
Just typing ml on its own is the same as ml list.
Whether using module or ml you can load multiple modules with a single command:
module add R/4.1.0 samtools bedtools
or
ml R/4.1.0 samtools bedtools
You have several options as to where to use “module load” commands.
When you issue a module load command the modules program checks whether you already have a different version of the same program loaded as a module. If you do, it reports an error and does not load the module a second time.
Suppose you have the latest version of R loaded in your .bashrc file, but also have a pipeline that has been installed and thoroughly tested with a previous version of R, and you have in the scripts for that pipeline something like:
module load R/3.6.3
This would generate an error because of the latest version of R already being loaded.
So, in your script you should unload R before loading the new version.
module unload R module load R/3.6.3
You might also consider unloading all modules, and loading only those you need, before starting up your pipeline:
module purge module load R/3.6.3
In general, the environment modules just edit your PATH variable (they usually prepend a directory to your PATH). The convention is that all programs loaded by “module load” can be found in a sub-directory of /opt. The naming convention is:
/opt/SOFTWARE/VERSION
Where SOFTWARE would be replaced by the name of the software, e.g. R, and VERSION would be replaced by a version number for that software, e.g. 3.6.3 (for R).
Usually (but not always) the executable programs will be in /opt/SOFTWARE/VERSION/bin.
You can find specifically what “module load” does for a piece of software by looking at the modulefile for that piece of software. It can be found at:
/opt/modules/modulefiles/SOFTWARE/VERSION
(This is a file - not a directory.)
If you prefer not to use modules, you can just update your PATH to include the directories of the pieces of software you want to use (possibly updating your PATH in .bashrc).
In most cases, modules is just a nice easy way of updating you PATH. So it seems preferable to use the module command rather than updating your PATH explicitly.
There are a couple of special purpose modules.
More information about environment modules can be found here: https://modules.readthedocs.io/en/latest/