SGE Job submission for cc2 login clusterThis page describes how you can submit job with SGE system. |
Authors | Sébastien RAVEL |
---|---|
Research Unit | |
Institut | |
Creation Date | 30/03/2018 |
Last Modified Date | 30/03/2018 |
Keywords : qsub, qrsh, job, cc2-login
Date : 02/06/2017
Summary
- Summary
- How to run correctly Jobs
- 1 - Use Module load
- 2 - Run Job
- 3 - Knowing / asking for resources
- 4 - Jobs Resources
- 5 - Get information about running jobs
- 6 - Delete job
- 7 - More infos
How to run correctly Jobs
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Not starting a program on the master node of the cluster: (cf: cc2-admin)
1 - Use Module load
On the cluster the tools do not load by default. Each user must “load” programs to use them. This system allows you to have different versions of the same tools without creating conflicts. The disadvantage is that if you forget to load the module the program does not work … Here are 3 methods to manage modules:
A - Load modules by default on connection
If you often use the same program you can add to the .bashrc file For example the program GIT or python2.7 that I often use:
B - Loading before job submission (before qsub)
This method loads modules before qsub. It will be MANDATORY to put the -V parameter in the qsub to pass the modules to the compute node.
C - Loading in Job Script
The best way for the job to work. (We always forget to load before qsub ….) It’s simple enough to have the modules loaded by the compute node. It is enough to add in the script.sh the module load before the control of the program. Example in file __run_raxml.sh__:
PS: leave the -V argument in your job, because sometimes it is necessary despite the loading of the modules in the script.
2 - Run Job
A - qsub mode
To submit a job you must use the command qsub
With the following arguments:
Arguments submission Jobs | |
---|---|
-i file | Use file as standard input for this job |
-o file | Set the job’s standard output to a file (which should be displayed in the terminal) |
-e file | Set the standard error output of the job to the file (when returns error) |
-N name | Name the job for output files (replaces STDIN by default) |
-cwd | Uses the current working directory for input and output, rather than /homedir/username/ |
-q queue | Specifies a queue |
-M my address@work | Receive by e-mail job info: |
-m beas: Allows you to select events to receive: | |
-b: warned at first | |
-e: end of the job | |
-a: job interrupted | |
-s: job suspended | |
-l mem_free=nG | run job with “n” Go de RAM (see below) |
-pe parallel_smp n | run job with “n” threads (see below) |
B - qrsh mode (for test script)
There is a method for not running tests on the master node: Request an interactive job This type of job brings advantages but also disadvantages:
- Benefits: allows you to test scripts to debug programs
- Disadvantages: the job and kill if the terminal and closed
It is therefore necessary to use it ONLY for the debug.
The command is similar to the * qrsh * and takes at least the -q argument:
3 - Knowing / asking for resources
In computer science there are 2 types of resources to make a program:
- CPU / Thread
- RAM
The CPU is the number of cores of a machine, and each core divides into threads. A program uses the CPU when it needs to do a lot of calculations.
RAM, express in Go or To corresponds to the active memory of the computer. It preloads information to perform the calculation more quickly. (The access time is much faster than reading from disk) A program that needs to load files in memory uses more RAM
Once this theory is understood, one can see in the programs parameters such as
- CPU
- Thread
- Java -Xmx18G
- …
These are the famous parameters to increase the computing power.
On the Cluster the available resources are quite important:
queue | NB Threads | NB RAM |
---|---|---|
long.q and normal.q | 48 | 200Go |
bigmem.q | 96 | 2.6To |
4 - Jobs Resources
By default a job uses 1 thread and 10GB of RAM
To boost the job you have to ask for more resources.
A - Request more RAM
B - Request more Threads
5 - Get information about running jobs
Jobs status | |
---|---|
qstat | Displays the status of all jobs |
qstat -f | Displays the status of all queues (long list) |
qstat -u “*“ | Displays the status of all jobs belonging to all users |
qstat -g c | Resources available |
qstat -j jobid | Displays the status of a particular job (jobid = 1st qstat column) |
6 - Delete job
7 - More infos
https://doc.cc.in2p3.fr/en:ge_submit_a_job_qsub