Job submission and monitoring

Most work on Lovelace is done by submitting jobs to the queueing system. A job is defined by its submission script, which specifies what programs to run and what resources will be needed for them.

Submission Scripts

An example submission script:

 
#!/bin/bash
#PBS -q compute
#PBS -N test_job
#PBS -l walltime=00:10:00
#PBS -l nodes=1:ppn=40
#PBS -A training2020
cd $PBS_O_WORKDIR
module load intel/compiler/64/2018/18.0.5
${HOME}/bin/test_script

 

The #PBS lines tell the queueing system what resources are being requested for this job. In this case it is requesting a single 40-core node from the compute queue for 10 monutes. The cost of the job will be billed to the training2020 project code. The job is called "test_job".

The other lines are executed when the job runs. In this example the job changes to the directory the job was submitted in (PBS_O_WORKDIR), load a module and then runs a program call test_script.

Submitting and Cancelling Jobs

To submit a job use the qsub command. For example, if the job's submission script is called "job.sh":

$ qsub job.sh 
56908.master01.cm.cluster
 

When the queueing system accepts a job it assigns it a job number. In this example the job is 56908.

 

To cancel a job use the qdel command with the job's number.

$ qdel 56908

 

If the job has already completed or been cancelled then you may get an error message.

Listing Jobs

Once a job has been submitted it will sit in the queue until resources are available to run it. To list the running and queued jobs there are several commands available.

showq usually provides the best overview of what jobs are queued and running.