Why do we need periodic jobs

In a large system, there are many moving parts. There are often things to do periodically (per minute, per hour, per month, etc.). Examples of such period jobs could be:

  • Triggering a job to send a daily or weekly digest to users
  • To refresh a materialized view table in a database
  • To evict and clean up data caches
  • Running periodic maintenance

cronjob has been a workhorse to schedule and run such periodic jobs. Following is an example of how a Linux cronjob works.

(Ref: https://code.tutsplus.com/tutorials/scheduling-tasks-with-cron-jobs--net-8800)

Limitations of plain cronjobs

For a large system like Quora, there will be many jobs on each frequency. If we are using a single machine to initiate these jobs, at some point, it will not be beyond the capacity of the device, or there might be high latency if we somehow provide a schedule for job runs. Spawning too many jobs on a single server can overload the server and increase job failure probability, so such an approach is neither reliable nor scalable.

Manually distributing the load

One solution could be to use more than one machine and distribute the jobs between them. We might run odd minute jobs on one machine if there are two machines, while even minute jobs on the other. Though there are challenges:

  • If a machine fails, probably, manual intervention is required to manage such an event.
  • As the number of machines increases, the script managing the load balancing becomes unwieldy, and such a system will not be robust.
  • Managing stuck jobs or avoiding duplicated running jobs becomes a challenge.

Create a free account to access the full course.

By signing up, you agree to Educative's Terms of Service and Privacy Policy