Differences

This shows you the differences between two versions of the page.

--- copying_data_to_the_nodes [2014/04/24 13:32]
root created
+++ copying_data_to_the_nodes [2014/11/04 17:44] (current)
root
@@ Line 1: / Line 1: @@
 ====== Copying Data to the Nodes ======
-If you have a job that reads the same data file many times, or makes random accesses to a data file, it may be more efficient to have that data locally on a node than compete with other users to access the file server.
+If you have a job that reads the same data file many times, or makes many "random" accesses to a data file, it may be more efficient to have that data locally on a node than compete with other users to access the file server.
 Each node has almost 1TB of space mounted on /tmp. This /tmp space is local to each node.
@@ Line 9: / Line 9: @@
 If what your program does is read a file strictly sequentially just once, this copy is unlikely to help.
-There is a couple of options of doing the copy.
+There is a couple of options for doing the copy.
 ) Do it directly in your script..
@@ Line 15: / Line 15: @@
 <code>
 cp /home/username/mydata.fastq /tmp
-run your process on the data in .tmp
+... Run your process on the data in /tmp ...
+rm /tmp/mydata.fastq
 </code>
 (Really you would use mktemp to get a unique name to avoid clashes.)
+Should be careful if you have multiple copies of your script running on a node: you could be copying the data multiple times.
 ) Copy to all nodes allocated to your task using sbcast.

BRC Cluster Workshop