This is an old revision of the document!
You can access the internet from the cluster head node, so you can download data to the cluster from elsewhere. There are many techniques and protocols for doing that. Commonly used tools are wget, curl, sftp, ftp.
You can also access the internet from any of the compute nodes. You could, for instance, send lookup requests to an external database (e.g. NCBI) from a node. It is not possible to create new connections from the outside world to a node directly (it is possible to set up such connections using SSH tunneling if you need to, but those connections are subject to you logging in to your account).
Note though, that all access to the internet at large goes through the head node and out through the head node's internet connection. This creates a bottleneck: you do not get more bandwidth to the outside world by running large data downloads on multiple nodes.