Submitting, monitoring, and canceling jobs¶
About¶
The goal of this document is to provide an overview of the basic methods from submitting, monitoring, and cancelling running or submitted jobs.
Before moving forward with the recommendations provided below in this document, it is imperative that the users who will be managing jobs on Discoverer:
- Have a valid account to access the Discoverer cluster
- Be aware of the way we at Discoverer allocate and count the computational resources (see Computational resources allocation and accounting)
- Read the document Organizing your Slurm batch scripts
Job submission¶
Batch script submission¶
Warning
This way of submitting jobs is the most preferred one.
Important
Before submitting a script to the queue, be sure the script code matches the requirements in Organizing your Slurm batch scripts.
The easiest way to send a Slurm batch script to the queue is to execute on the login node the following command line:
sbatch job.batch
where job.batch
is the file containing the script lines necessary for running the job. One may use this document:
https://slurm.schedmd.com/sbatch.html
to determine which additional options, if required, may be incorporated into the script or passed as arguments to sbatch
.
Upon successful submission, sbatch
returns the assigned job ID (integer number). Later, the submitted job can be monitored or cancelled based on that ID.
Interactive job submission and execution¶
Warning
This way of running jobs is not promoted or supported by us.
Monitoring of submitted jobs¶
The easiest way to monitor the job execution is to execute:
squeue jobID
where jobID
is the job ID identifier provided by sbatch
upon submission.
If you like to list all your submitted jobs:
squeue --me
The squeue
tool can provide extended information regarding the job size and its executuion. This document:
https://slurm.schedmd.com/squeue.html
lists the full list of options one can pass to squeue
.
Canceling jobs¶
Only successfully submited jobs can be canceled.
To cancel a job with ID jobID
use scancel
:
scancel jobID
Warning
If a job cannot be canceled, ask the Support to do that (see Getting help).
Getting help¶
See Getting help