Introduction to Docker caching

Introduction to Docker

Tim Sangster

Software Engineer @ DataCamp

Docker build

Downloading and unzipping a file using the Docker instructions.

RUN curl http://example.com/example_folder.zip
RUN unzip example_folder.zip

Will change the file system and add:

/example_folder.zip
/example_folder/
    example_file1
    example_file2

It is these changes that are stored in the image.

Introduction to Docker

Docker instructions are linked to File system changes

Each instruction in the Dockerfile is linked to the changes it made in the image file system.

FROM docker.io/library/ubuntu
 => Gives us a file system to start from with all files needed to run Ubuntu
COPY /pipeline/ /pipeline/
 => Creates the /pipeline/ folder
 => Copies multiple files in the /pipeline/ folder
RUN apt-get install -y python3
 => Add python3 to /var/lib/
Introduction to Docker

Docker layers

  • Docker layer: All changes caused by a single Dockerfile instruction.
  • Docker image: All layers created during a build

--> Docker image: All changes to the file system by all Dockerfile instructions.

While building a Dockerfile, Docker tells us which layer it is working on:

 => [1/3] FROM docker.io/library/ubuntu 
 => [2/3] RUN apt-get update
 => [3/3] RUN apt-get install -y python3
Introduction to Docker

Docker caching

Consecutive builds are much faster because Docker re-uses layers that haven't changed.

Re-running a build:

 => [1/3] FROM docker.io/library/ubuntu
 => CACHED [2/3] RUN apt-get update
 => CACHED [3/3] RUN apt-get install -y python3

Re-running a build but with changes:

 => [1/3] FROM docker.io/library/ubuntu
 => CACHED [2/3] RUN apt-get update
 => [3/3] RUN apt-get install -y R
Introduction to Docker

Understanding Docker caching

When layers are cached helps us understand why sometimes images don't change after a rebuild.

  • Docker can't know when a new version of python3 is released.
  • Docker will use cached layers because the instructions are identical to previous builds.
 => [1/3] FROM docker.io/library/ubuntu
 => CACHED [2/3] RUN apt-get update
 => CACHED [3/3] RUN apt-get install -y python3
Introduction to Docker

Understanding Docker caching

Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.

In the following Dockerfile all instructions need to be rebuild if the pipeline.py file is changed:

FROM ubuntu
COPY /app/pipeline.py /app/pipeline.py
RUN apt-get update
RUN apt-get install -y python3
 => [1/4] FROM docker.io/library/ubuntu
 => [2/4] COPY /app/pipeline.py /app/pipeline.py
 => [3/4] RUN apt-get update
 => [4/4] RUN apt-get install -y python3
Introduction to Docker

Understanding Docker caching

Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.

In the following Dockerfile, only the COPY instruction will need to be re-run.

FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
COPY /app/pipeline.py /app/pipeline.py
 => [1/4] FROM docker.io/library/ubuntu
 => CACHED [2/4] RUN apt-get update
 => CACHED [3/4] RUN apt-get install -y python3
 => [4/4] COPY /app/pipeline.py /app/pipeline.py
Introduction to Docker

Let's practice!

Introduction to Docker

Preparing Video For Download...