Etiquette
Congratulations on getting an account on one of the IRHPC clusters. The documentation here is applicable to all of the machines. Please read this page carefully, and if you are in any doubt on how to perform your tasks on the cluster do not hesitate to contact support.
-
Never give your login password or SSH key to anyone else
-
Never connect to Elja through an unsecure public network (see here)
Login Nodes
The login nodes are a shared resource. This is where all of the users perform tasks to prepare and submit their jobs.
That is why it should only be used for the following simple tasks:
- Submit jobs
- Edit scripts and files
- Prepare / delete data (scp, cp, mv, rm etc.)
- Run effortless scripts (no computations)
- Compile small software packages (e.g. an in-house code)
System administrators will kill processes which are resource intensive.
Repeated offense will result in the suspension of your user account
Available Login Nodes
Currently there are four login nodes available to user depending on what HPC resource the PI has applied for. They are login, slogin1, slogin2, and slogin3. These four login nodes are used to manage work on the two clusters, Elja and Stefnir. Elja hosts the login node while Stefnir hosts the three slogin nodes.
Both slogin1 and the login nodes serve the same purpose while slogin2-3 are meant for different use cases. The slogin2 node is meant for data transfer and slogin3 is used to provide users the availability to run GUI (Graphical User Interface) software on the cluster.
Overview of these login nodes can be viewed in the following table:
login node | Connection name | Cluster | Use case |
---|---|---|---|
login | elja.hi.is | Elja | Used as a common login node for Elja |
slogin1 | slogin1.rhi.hi.is | Stefnir | Used as a common login node for Stefnir |
slogin2 | slogin2.rhi.hi.is | Stefnir | Used for transferring data |
slogin3 | slogin3.rhi.hi.is | Stefnir | Used for graphical visualization |
Resource Management
Your home directory will be located in either two different locations with your - username, one is hosted on the nfs-irhpc NFS server and is located at /users/home/username. The other location is /hpchome/username and is hosted on the parallel file system NetApp.
The disk spaces allocated to the NFS and NetApp home directories serve as a shared resource. It is not intended for the storage of large data. It is advisable to delete or periodically transfer files or data not being used for jobs from your home directory to another drive outside of the NFS server for storage, such as your personal computer, or into project directories on NetApp if the PI has applied for enhanced storage.
The location of the home directory is based on when your account was created. As of now the home directories for the majority of new accounts will be located in /hpchome/username.
To check the location of the home directory you can simply type in the command line:
echo $HOME
IF you require disk space to store large amounts of data (more than 1 Tb) for later jobs, or if your jobs generate a large amount of data that require further processing please contact support. Other solutions can be provided.
Scratch Disk
Each compute node has a dedicated /scratch/ disk (see here for hardware specs). It is a local disk which is intended for the temporary storage of data to be processed, and writing of output. This disk facilitates fast I/O (input/output) when running jobs. Users have read/write privilages here
/scratch/users/
See here for instructions on how to make use of the /scratch/ disks.
IMPORTANT: It is important to make efficient use of Elja and to not slow down the network traffic on the cluster. Hence, it is advisable to copy the data and input for your job over to the local scratch on the node (/scratch/users/uname) and launch the program from the scratch directory. If this step is omitted then the program will run remotely on the compute node but will constanly read/write from the directory on the NFS server. This creates a lot of network traffic that slows down the use of Elja for everybody. It will also slow down the job itself.
It is important to make sure you clean up after your job on the /scratch/ disk. IF your job crashes and leaves behind data which you think can be salvaged contact support as soon as possible. System administrators will delete data on the /scratch/ disks not associated with a running job, without notice.
Sensitive Data
If you require an account to the HPC system and you need to store sensitive data, then you can connect to the system administrators through help@hi.is