Introduction to Docker
Tim Sangster
Software Engineer @ DataCamp
Downloading and unzipping a file using the Docker instructions.
RUN curl http://example.com/example_folder.zip
RUN unzip example_folder.zip
Will change the file system and add:
/example_folder.zip
/example_folder/
example_file1
example_file2
It is these changes that are stored in the image.
Each instruction in the Dockerfile is linked to the changes it made in the image file system.
FROM docker.io/library/ubuntu
=> Gives us a file system to start from with all files needed to run Ubuntu
COPY /pipeline/ /pipeline/
=> Creates the /pipeline/ folder
=> Copies multiple files in the /pipeline/ folder
RUN apt-get install -y python3
=> Add python3 to /var/lib/
--> Docker image: All changes to the file system by all Dockerfile instructions.
While building a Dockerfile, Docker tells us which layer it is working on:
=> [1/3] FROM docker.io/library/ubuntu
=> [2/3] RUN apt-get update
=> [3/3] RUN apt-get install -y python3
Consecutive builds are much faster because Docker re-uses layers that haven't changed.
Re-running a build:
=> [1/3] FROM docker.io/library/ubuntu
=> CACHED [2/3] RUN apt-get update
=> CACHED [3/3] RUN apt-get install -y python3
Re-running a build but with changes:
=> [1/3] FROM docker.io/library/ubuntu
=> CACHED [2/3] RUN apt-get update
=> [3/3] RUN apt-get install -y R
When layers are cached helps us understand why sometimes images don't change after a rebuild.
=> [1/3] FROM docker.io/library/ubuntu
=> CACHED [2/3] RUN apt-get update
=> CACHED [3/3] RUN apt-get install -y python3
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile all instructions need to be rebuild if the pipeline.py file is changed:
FROM ubuntu
COPY /app/pipeline.py /app/pipeline.py
RUN apt-get update
RUN apt-get install -y python3
=> [1/4] FROM docker.io/library/ubuntu
=> [2/4] COPY /app/pipeline.py /app/pipeline.py
=> [3/4] RUN apt-get update
=> [4/4] RUN apt-get install -y python3
Helps us write Dockerfiles that build faster because not all layers need to be rebuilt.
In the following Dockerfile, only the COPY instruction will need to be re-run.
FROM ubuntu
RUN apt-get update
RUN apt-get install -y python3
COPY /app/pipeline.py /app/pipeline.py
=> [1/4] FROM docker.io/library/ubuntu
=> CACHED [2/4] RUN apt-get update
=> CACHED [3/4] RUN apt-get install -y python3
=> [4/4] COPY /app/pipeline.py /app/pipeline.py
Introduction to Docker