Description | Hands On Lab Exercises for SLURM |
---|---|
Related-course materials | Linux for Dummies |
Authors | Julie Orjuela-Bouniol (julie.orjuela@ird.fr) - i-trop platform (UMR BOREA / DIADE / IPME - IRD) |
Creation Date | 21/09/2018 |
Last Modified Date | 22/09/2019 |
SLURM some basic commands
Slurm offers many commands you can use to interact with the system.
The sinfo
command gives an overview of the resources offered by the cluster
The squeue
command shows to jobs those resources are currently allocated (qstat). This command lists currently running jobs (they are in the RUNNING state, noted as ‘R’) or waiting for resources (noted as ‘PD’, short for PENDING).
You can show more complex information with -o option, -i parameter allow lauch squeue every n seconds
Allocate resources by using srun command
If you need simply to have an interactive Bash session on a compute node, with the same environment set as the batch jobs, use srun
. srun allows allocate a interactive session with ressources determinated in parameters (similar to qrsh or qhost SGE mode).
How do you create a job?
A job consists in two parts: resource requests and job steps.
Resource requests consist in a number of CPUs, computing expected duration, amounts of RAM or disk space, etc. Job steps describe tasks that must be done, software which must be run.
The typical way of creating a job is to write a submission script. A submission script is a shell script. If they are prefixed with SBATCH, are understood by Slurm as parameters describing resource requests and other submissions options. You can get the complete list of parameters from the sbatch manpage man sbatch.
in this exemple job.sh contains ressources request (lines starting with #SBATCH) and a sleep unix command.
job.sh request one CPU for 10 minutes, along with 100 MB of RAM, in the default queue. When started, the job would run a sleep unix command.
The sbatch
command allows submit a script.
Interestingly, you can get near-realtime information about your running program (memory consumption, etc.) with the sstat
command
If you want to cancel a job, use scancel
command (qdel)
If you want to kwon ressources used by a finished job, use seff
command