Search⌘ K
AI Features

Job Schedulers

Understand the role of job schedulers in HPC systems, how batch queuing manages resource-intensive jobs, and explore popular schedulers like PBS and SLURM used to efficiently run computational tasks.

What is a HPC Job?

In the arena of HPC, we talk a lot about jobs, these are simply commands we wish to run and requests for resources (e.g. compute time, disk space, memory amount, software environments etc.). HPC jobs are generally time consuming and resource intensive run non-interactively. However, they can be run interactively, but mainly for testing purposes.You need add your jobs to a queue and when machines have free resources jobs run. Once jobs are complete, you can inspect their output.

A logical diagram of the HPC batch system and scheduler
A logical diagram of the HPC batch system and scheduler

Batch queuing systems and job schedulers

Video thumbnail

The batch system is a program (typically resides on the head node) that lets you add and remove jobs from the queue and monitor the queue. It’s a script/command line driven program.

The Job schedulers manages job queues. Typically, the scheduler will schedule jobs from the queue as sufficient resources (more precisely - cluster compute nodes) become idle. You do not need to interact with schedulers directly.

Some widely used cluster batch systems are:

  • Portable Batch System (PBS)
  • Simple Linux Utility for Resource Management (SLURM)
  • Moab
  • Univa Grid Engine
  • LoadLeveler, Condor
  • OpenLava
  • IBM’s Platform LSF

The first two in the above list are popular. Therefore, in this course we will restrict our discussion among these two.