Submitting, monitoring, and canceling jobs

About

The goal of this document is to provide an overview of the basic methods from submitting, monitoring, and cancelling running or submitted jobs.

Before moving forward with the recommendations provided below in this document, it is imperative that the users who will be managing jobs on Discoverer:

  1. Have a valid account to access the Discoverer cluster
  2. Be aware of the way we at Discoverer allocate and count the computational resources (see Computational resources allocation and accounting)
  3. Read the document Organizing your Slurm batch scripts

Job submission

Batch script submission

Warning

This way of submitting jobs is the most preferred one.

Important

Before submitting a script to the queue, be sure the script code matches the requirements in Organizing your Slurm batch scripts.

The easiest way to send a Slurm batch script to the queue is to execute on the login node the following command line:

sbatch job.batch

where job.batch is the file containing the script lines necessary for running the job. One may use this document:

https://slurm.schedmd.com/sbatch.html

to determine which additional options, if required, may be incorporated into the script or passed as arguments to sbatch.

Upon successful submission, sbatch returns the assigned job ID (integer number). Later, the submitted job can be monitored or cancelled based on that ID.

Interactive job submission and execution

Warning

This way of running jobs is not promoted or supported by us.

Monitoring of submitted jobs

The easiest way to monitor the job execution is to execute:

squeue jobID

where jobID is the job ID identifier provided by sbatch upon submission.

If you like to list all your submitted jobs:

squeue --me

The squeue tool can provide extended information regarding the job size and its executuion. This document:

https://slurm.schedmd.com/squeue.html

lists the full list of options one can pass to squeue.

Canceling jobs

Only successfully submited jobs can be canceled.

To cancel a job with ID jobID use scancel:

scancel jobID

Warning

If a job cannot be canceled, ask the Support to do that (see Getting help).

Getting help

See Getting help