Utilisation de slurm

Description	Know how to use Slurm
Author	Ndomassi TANDO (ndomassi.tando@ird.fr)
Creation date	08/11/2019
modification date	30/03/2020

Summary

Objective
Launch jobs with Slurm
Resources supervision with Slurm
Liens
License

Objectives

Know how to launch different type of jobs with Slurm.

Jobs monitoring with Slurm

Launch jobs with Slurm:

Launch commands from the master node

The following command allocate computing resources ( nodes, memory, cores) and immediately launch the command on each allocate resource.

$ srun + command

Example:

$ srun hostname

Allow to obtain the name of the computing resource used

Reserve computing resources to launch Slurm commands

We use:

$ salloc

This sommand allows to reserve one or several computing resources and work on the master node at the same time.

Commands on the computing resources can be launched later with the command srun + arguments.

When you use this command, it is important to precise a reservation time with the option –time

Example:We reserve 2 nodes( option -N) at the same time for 5 minutes and later we run the hostname command thanks to srun

$ salloc --time=05:00 -N 2 -p short
$ srun hostname

We obtain:

$[tando@master0 ~]$ srun hostname
node21.alineos.net
node14.alineos.net

Connect to a node in interactive mode and launch commands:

To connect to a node in interactive mode for X minutes , use the following command:

$ srun -p short --time=X:00 --pty bash -i

Then you can launch on this node without using the srun prefix

Connect to a node in interactive mode with x11 support:

The x11 support allows you to launch graphical software within a node.

You first have to connect to the bioinfo-master.ird.fr with the -X option:

$ ssh -X login@bioinfo-master.ird.fr

Then you can launch this command with the --x11 option

$ srun -p short --x11 --pty bash -i

Partitions available:

Depending on the type of jobs you want to launch you have the choice between several partitions.

The partitions can be considered job queues, each of which has an assortment of constraints such as job size limit, job time limit, users permitted to use it, etc.

Priority-ordered jobs are allocated nodes within a partition until the resources (nodes, processors, memory, etc.) within that partition are exhausted.

Here are the available partitions:

partition	role	nodes list	Number of Cores	Ram on nodes
short	Short Jobs < 1 day (higher priority,interactive jobs)	node0,node1,node2,node13,node14	12 cores	48 to 64 GB
normal	job of maximum 7 days	node0,node1,node2,node13,node14,node15,node16,node17,node18,node19,node20,node22,node23,node24	12 to 24 cores	64 to 96GB
r900	job of maximum 7 days	node5	16 cores	32 GB
long	<7 days< long jobs< 45 days	node3,node8,node9,node10,node11,node12	12 to 24 cores	48 GB
highmem	jobs with more memory needs	node4, node7,node17,node21	12 to 24 cores	144 GB
supermem	jobs with much more memory needs	node25	40 cores	1 TB
gpu	Need of analyses on GPU cores	node26	24 cpus and 8 GPUS cores	192 GB

Note that the gpu node access is restricted, a request access should be done here: request access to gpu

Main options for Slurm:

salloc, srun or sbatch can be used with the following options:

actions	Slurm Options	SGE options
Choose a partition	-p [queue]	-q [queue]
Number of nodes to use	-N [min[-max]]	N/A
Number of tasks to use	-n [count]	-pe [PE] [count]
Time limitation	-t [min] ou -t [days-hh:mm:ss]	-l h_rt=[seconds]
Precise a output file	-o [file_name]	-o [file_name]
Precise a error file	-e [file_name]	-e [file_name]
Combine STDOUT et STDERR files	utiliser -o sans -e	-j yes
Copy the environnement	–export=[ALL , NONE , variables]
Type of notifications to send	–mail-type=[events]	-V
Send a mail	–mail-user=[address]	-M [address]
Job Name	–job-name=[name]	-N [name]
Relaunch job in case of problem	–requeue	-r [yes,no]
Set the working dir	–workdir=[dir_name]	-wd [directory]
Memory size	–mem=[mem][M,G,T] ou-mem-per-cpu=[mem][M,G,T]	-l mem_free=[memory][K,M,G]
Charge to a account	–account=[account]	-A [account]
Tasks per node	–tasks-per-node=[count]	(Fixed allocation_rule in PE)
cpus per task	–cpus-per-task=[count]	N/A
Job dependency	–depend=[state:job_id]	-hold_jid [job_id , job_name]
Job host preference	–nodelist=[nodes] ET/OU –exclude=[nodes]	-q [queue]@[node] OR -q
[queue]@@[hostgroup]
Job arrays	–array=[array_spec]	-t [array_spec]
Begin Time	–begin=YYYY-MM-DD[THH:MM[:SS]]	-a [YYMMDDhhmm]

Launch jobs via a script

The batch mode allows to launch an analysis by following the steps described into a script.

Slurm allows to use different types of scripts such as bash, perl or python.

Slurm allocates the desired computing resources and launch analyses on these resources in background

To be interpreted by Slurm, the script should contain a specific header with all the keyword #BATCH to precise the Slurm options.

Slurm script example:

#!/bin/bash
## Define the job name
#SBATCH --job-name=test
## Define the output file
#SBATCH --output=res.txt
## Define the number of tasks
#SBATCH --ntasks=1
## Define the execution time limit
#SBATCH --time=10:00
## Define 100Mo of memory per cpu
#SBATCH --mem-per-cpu=100
sleep 180 #launch a 3 minutes sleep

To launch an analysis use the following command:

$ sbatch script.sh

With script.sh the name of the script to use

Submit a array job

#!/bin/bash
#SBATCH --partition=short      ### Partition
#SBATCH --job-name=ArrayJob    ### Job Name
#SBATCH --time=00:10:00        ### WallTime
#SBATCH --nodes=1              ### Number of Nodes
#SBATCH --ntasks=1             ### Number of tasks per array job
#SBATCH --array=0-19%4           ### Array index from 0  to 19 with 4 runnings jobs

 
echo "I am Slurm job ${SLURM_JOB_ID}, array job ${SLURM_ARRAY_JOB_ID}, and array task ${SLURM_ARRAY_TASK_ID}."

You have to use the $SBATCH --arrayoption to define the range

The variable ${SLURM_JOB_ID} precise the job id

${SLURM_ARRAY_JOB_ID}precise the id of the job array

${SLURM_ARRAY_TASK_ID} precise the number of the job array task.

The script should give a answer like:

$ sbatch array.srun
Submitted batch job 20303
$ cat slurm-20303_1.out
I am Slurm job 20305, array job 20303, and array task 1.
$ cat slurm-20303_19.out
I am Slurm job 20323, array job 20303, and array task 19.

Submit a R job

You can use the same syntax than before for Slurm. You just have to launch your R script with the R script command

#!/bin/bash
## Define the job name
#SBATCH --job-name=test
## Define the output file
#SBATCH --output=res.txt
## Define the number of tasks
#SBATCH --ntasks=1
## Define the execution time limit
#SBATCH --time=10:00
## Define 100Mo of memory per cpu
#SBATCH --mem-per-cpu=100
Rscript script.R #launch the R script script.R

Submit a job with several command in parallel at the same time

You have to use the options --ntasks and --cpus-per-task

Example:

#!/bin/bash

#SBATCH --ntasks=2
#SBATCH --cpu-per-task=2

srun --ntasks=1 sleep 10 & 
srun --ntasks=1 sleep 12 &
wait

In this example, we use 2 tasks with 2 cpus allocated per task that is to say 4 cpus allocated for this job.

For each task a sleep is launched at the same time.

Notice the use of srun to launch a parallelised command and the & to launch the command in background

The wait is needed here to ask the job to wait for the end of each command before stopping

Submit an OpenMP job:

A OpenMP job is a job using several cpus on the same single node. Therefore the number of nodes will always be one.

This will work with a program compiled with openMP

#!/bin/bash
#SBATCH --partition=short   ### Partition
#SBATCH --job-name=HelloOMP ### Job Name
#SBATCH --time=00:10:00     ### WallTime
#SBATCH --nodes=1           ### Number of Nodes
#SBATCH --ntasks-per-node=1 ### Number of tasks (MPI processes)
#SBATCH --cpus-per-task=28  ### Number of threads per task (OMP threads)
#SBATCH --account=hpcrcf    ### Account used for job submission
 
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
./hello_omp

Environment variables:

   SLURM_JOB_ID		The ID of the job allocation.
   SLURM_JOB_NAME		The name of the job.
   SLURM_JOB_NODELIST	List of nodes allocated to the job.
   SLURM_JOB_NUM_NODES	Number of nodes allocated to the job.
   SLURM_NTASKS		Number of CPU tasks in this job.
   SLURM_SUBMIT_DIR	The directory from which sbatch was invoked.

Delete a job

$ scancel <job_id>

With <job_id>: the job number

Monitor resources:

Get jobs infos :

use the command

$ squeue

To refresh the infos every 5 seconds

$ squeue -i 5

Infos on a particular job:

$ scontrol show job <job_id>

With <job_id>: the job number

Infos on the jobs of a particular user

$ squeue -u <user>

With <user>: the user login

More infos on jobs:

$ sacct --format=JobID,elapsed,ncpus,ntasks,state,node

Infos on resources used by a finished job

$ seff <job_id>

With <job_id>: the job number

You can add the following command at the end of your script to get infos of the jobs in your output file.

$ seff $SLURM_JOB_ID

Get infos on partition:

Type the following command:

$ sinfo

It gives infos on partitions and nodes

To get more informations:

$ scontrol show partitions

scontrol show can be used with nodes, user, account etc…

Konow the time limit for each partition:

$sinfo -o "%10P %.11L %.11l"

Get infos on nodes

Type the command:

$ sinfo -N -l

Several states are possible:

alloc : The node is fully uses
mix : The node is partially used
idle : No job is runing on the node
drain : node is finishing the received jobs but it doesn’t accept new ones ( when the node will be stopped for maintenance)

To obtain more informations :

$ scontrol show nodes

Liens

Cours liés : HPC Trainings

License

The resource material is licensed under the Creative Commons Attribution 4.0 International License (here).