Job submission and monitoring

Most work on Apollo is done by submitting jobs to the queueing system. A job is defined by its submission script, which specifies what programs to run and what resources will be needed for them.

Submission Scripts

An example submission script:

#!/bin/bash 
#SBATCH --job-name=lammps
#SBATCH --partition=compute
#SBATCH --time=2:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --account=pilot

module purge 
module load LAMMPS/2Aug2023_update2-foss-2023a-intel

export OMP_NUM_THREADS=2 

echo $SLURM_NODELIST 
echo



mpirun --bind-to core -np $SLURM_NTASKS lmp -k on t 2 -sf kk < in.eam


The #SBATCH lines tell the queuing system what resources are being requested for this job. In this case it is requesting 2 nodes, using 64 cores per node, from the compute partition (queue) for 2 hours.

The other lines are executed when the job runs. In this example, it unloads any previously loaded modules, loads the LAMMPS/2Aug2023_update2-foss-2023a-intel module, sets the number of threads it will use to 2 per task.  Finally, it prints the nodes allocated to this job.

Submitting and Cancelling Jobs

To submit a job use the sbatch command. For example, if the job's submission script is called "job.sh":

$ sbatch job.sh
Submitted batch job 26821 

When the queueing system accepts a job it assigns it a job number. In this example the job is 26821.

To cancel a job use the scancel command:

$ scancel 26821 

If the job has already completed or been cancelled then you may get an error message.

Listing Jobs

Once a job has been submitted it will sit in the queue until resources are available to run it. To list the running and queued jobs there are several commands available.

squeue usually provides the best overview of what jobs are queued and running.