Data job automation with cron

Data Processing in Shell

Susan Sun

Data Person

What is a scheduler?

  • Scheduler runs jobs on a pre-determined schedule
  • Commercial schedulers: Airflow, Luigi, Rundeck, etc.
  • cron scheduler is
    • simple
    • free
    • customizable
    • purely command-line
    • native to MacOS and Linux
Data Processing in Shell

What is cron?

Cron:

  • is a time-based job-scheduler
  • comes pre-installed in MacOS, Unix
  • can be installed in Windows via Cygwin or replaced with Windows Task Scheduler
  • is used to automate jobs like system maintenance, bash scripts, Python jobs, etc.
Data Processing in Shell

What is crontab?

Crontab is a central file to keep track of cron jobs.

crontab -l
no crontab for <username>

Documentation:

man crontab
Data Processing in Shell

Add a job to crontab

Method 1: modify crontab using a text editor (e.g. nano, Vim, Emacs)

Method 2: echo the scheduler command into crontab

echo "* * * * * python create_model.py" | crontab

Check if the job is properly scheduled:

crontab -l
* * * * * python create_model.py
Data Processing in Shell

Learning to time a cron job

The most frequent schedule for cron jobs is one minute.

Breaking down the time component for a cron job:

.---------------- minute (0 - 59)
|  .------------- hour (0 - 23)
|  |  .---------- day of month (1 - 31)
|  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
|  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed ...
|  |  |  |  |
*  *  *  *  * command-to-be-executed
Data Processing in Shell

Learning to time a cron job

*  *  *  *  * python create_model.py

Interpretation:

  • Run every minute of every hour of every day of every month and of every day of the week.

  • In short, run every minute

Further resources:

Data Processing in Shell

Let's practice!

Data Processing in Shell

Preparing Video For Download...