Job submission and monitoring
Most work on Apollo is done by submitting jobs to the queueing system. A job is defined by its submission script, which specifies what programs to run and what resources will be needed for them.
Submission Scripts
An example submission script:
#!/bin/bash
#SBATCH --job-name=lammps
#SBATCH --partition=compute
#SBATCH --time=2:00:00
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --account=pilot
module purge
module load LAMMPS/2Aug2023_update2-foss-2023a-intel
export OMP_NUM_THREADS=2
echo $SLURM_NODELIST
echo
mpirun --bind-to core -np $SLURM_NTASKS lmp -k on t 2 -sf kk < in.eam
The #SBATCH lines tell the queuing system what resources are being requested for this job. In this case it is requesting 2 nodes, using 64 cores per node, from the compute partition (queue) for 2 hours.
The other lines are executed when the job runs. In this example, it unloads any previously loaded modules, loads the LAMMPS/2Aug2023_update2-foss-2023a-intel
module, sets the number of threads it will use to 2 per task. Finally, it prints the nodes allocated to this job.
Submitting and Cancelling Jobs
To submit a job use the sbatch command. For example, if the job's submission script is called "job.sh":
$ sbatch job.sh
Submitted batch job 26821
When the queueing system accepts a job it assigns it a job number. In this example the job is 26821.
To cancel a job use the scancel command:
$ scancel 26821
If the job has already completed or been cancelled then you may get an error message.
Listing Jobs
Once a job has been submitted it will sit in the queue until resources are available to run it. To list the running and queued jobs there are several commands available.
squeue usually provides the best overview of what jobs are queued and running.