Optimizing Docker images

Intermediate Docker

Mike Metzger

Data Engineering Consultant

Docker image explanation

  • Docker images are the base of a given container
  • Holds all content initially available to a container instance
Intermediate Docker

Docker image concerns

  • Tempting to add all potentially needed components to an image
  • Size becomes large / unwieldy
  • Difficult to handle security / updates due to dependency issues
  • Harder to combine containers without wasting space / bandwidth
Intermediate Docker

Docker image recommendations

  • Split containers to the smallest level needed
  • Easier to combine multiple containers later vs. building a single large image
  • Like
    • building with reusable components
    • vs. building from scratch each time
  • Updates to specific software only affect containers using that image instead of all containers needing the update
  • Can optimize for size, making use and distribution much easier
Intermediate Docker

Docker image breakdown example

  • Consider a data engineering project using the following software:

    • Postgresql database
    • Python ETL software
    • Web server software
  • Possible to use a single image, but we would need to update the image each time we had an update to the ETL or web server setup.

  • What would happen if we needed to add another web server?
FROM ubuntu
RUN apt update
RUN apt install -y postgresql
RUN apt install -y nginx
RUN apt install -y python3.9
...
Intermediate Docker

Example with minimized containers

  • Better options with Docker
  • Split each into its own container
    • Postgresql database container
    • Python ETL components
    • Web server
  • Can build an optimized configuration for our use, and can add / remove components as needed
bash> docker run -d postgresql:latest

bash> docker run -d nginx:latest ...
Intermediate Docker

Determining image size

  • Using docker images
  • Shows individual image details, including size
  • More in-depth options covered later

bash> docker images

REPOSITORY      TAG              SIZE
postgres        latest           448MB
postgres        15               442MB
apache/airflow  2.7.1-python3.9  1.4GB
alpine          latest           7.73MB
Intermediate Docker

Let's practice!

Intermediate Docker

Preparing Video For Download...