User Tools

Site Tools


copying_data_to_the_nodes

Copying Data to the Nodes

If you have a job that reads the same data file many times, or makes many “random” accesses to a data file, it may be more efficient to have that data locally on a node than compete with other users to access the file server.

Each node has almost 1TB of space mounted on /tmp. This /tmp space is local to each node.

So, you could copy your data to the node and access it locally, than when your job is done, copy the results back to your home directory.

If what your program does is read a file strictly sequentially just once, this copy is unlikely to help.

There is a couple of options for doing the copy.

1) Do it directly in your script..

cp /home/username/mydata.fastq /tmp
... Run your process on the data in /tmp ...
rm /tmp/mydata.fastq

(Really you would use mktemp to get a unique name to avoid clashes.)

Should be careful if you have multiple copies of your script running on a node: you could be copying the data multiple times.

2) Copy to all nodes allocated to your task using sbcast.

sbcast will do (a possibly more efficient) copy to all the nodes in a SLURM allocation.

File: test_sbcast, sbcast_ls, mydata.txt

copying_data_to_the_nodes.txt · Last modified: 2014/11/04 17:44 by root