Terminology
Below are common terms used throughout on the wiki:
-
Login node :: front facing node where you can log into to submit, monitor and view the results of your jobs. It is not intended for number chrunching or computations.
-
Job :: in order to run calculations on the
HPC
cluster, one submits a job. This job consists of the executable program to run as well as command line arguments which configure theSLURM
scheduler. -
Compute nodes :: The work horses of the
HPC
cluster where computations are performed. Most of the nodes fall under this category. -
Partition :: A partition defines a set of compute nodes which fall under the same category, reflecting for example different hardware.
-
Scheduler :: A software running on the login node which is responsible for allocating resources to jobs in the queue. The Elja
HPC
makes use of theSLURM
scheduler. -
Slurm :: Slurm is a cluster management tool. Slurm is the system that is used to manage queue's, user management and FairShare.
-
FairShare :: An algorithm that makes sure that heavy users go to the back of the queue when less heavy users submit a job.
-
Queue :: A list of jobs waiting to be allocated resources. The amount of time spent in the queue depends on the amount of resources and time requested. The queue and priority therein is controlled by the
SLURM
scheduler. -
Lmod :: Lmod stands for "Lua Module" which is the module system used on the 'HPC' cluster to keep tree structure of modules for users.
-
Modules :: Modules are small Lua scripts that adds executables to your PATH as well as making sure all necessary libraries are available for that executable.
-
PATH :: The PATH variable is an environment variable containing an ordered list of paths that Linux will search for executables when running a command.