You can access the internet from the cluster head node, so you can download data to the cluster from elsewhere. There are many techniques and protocols for doing that. Commonly used tools are wget, curl, sftp, scp, and (now considered insecure) ftp. These tools are all available on the cluster.
You can also access the internet from any of the compute nodes. You could, for instance, send lookup requests to an external database (e.g. NCBI) from a node. It is not possible to create new connections from the outside world to a node directly (it is possible to set up such connections using SSH tunneling if you need to, but those connections are subject to you logging in to your account to create the tunnel).
Note though, that all access to the internet at large goes through the head node and out through the head node's 1Gbps internet connection. This creates a bottleneck: you do not get more bandwidth to the outside world by running large data downloads on multiple nodes.