Docker Primer (Rev: 06/02)

Docker has become a cornerstone of the modern development stack: how applications are built, shipped, and run. It is an excellent solution to the portability problem: containers run consistently on any machine, eliminating the "it works on my machine" problem. It leverages containers: self-contained units of software that package the components needed for an application to run. This primer introduces its core concepts.

May 1, 2024
đź’ˇ
Docker has become a cornerstone of the modern development stack: how applications are built, shipped, and run. It is an excellent solution to the portability problem: containers run consistently on any machine, eliminating the "it works on my machine" problem. It leverages containers: self-contained units of software that package the components needed for an application to run. This primer introduces its core concepts.
 
Revision: 20240602-0 (init: 20240501)
 
Docker is a virtualization technology for developing, deploying, and running applications.
It is a Containerization Platform that provides a way to package applications into standardized units called containers.
Containers, the backbone of Docker, encapsulate everything an application needs to run efficiently: the code, its runtime environment, system tools, libraries, and settings. This efficient resource utilization is a crucial feature that sets Docker apart.
 

About Docker

Docker and Virtual Machines

Docker and virtual machines (VMs) are virtualization technologies that work differently, and to better understand this, let's discuss computers and Operating Systems (OS):
  • A computer has access to many resources: the CPU (with a set number of compute cores), its memory (generally in GB), storage space (in GB or TB), and additional peripherals (video, sound, input devices, etc.)
  • An Operating System consists of a kernel and the processes that run on top of this kernel. The kernel is the interface between the hardware and the processes running on the computer. It manages memory, schedules processes, allows processes to communicate with the underlying hardware using drivers, and handles requests from processes for system resources.
With the increased availability of resources in modern computers, efficient utilization and flexibility via application portability have become a central focus of IT operations. Both virtual Machines (VMs) and containers offer unique ways to run an application on a running operating system, providing developers with a range of options and empowering them to choose the best approach for their needs.
VMs create a complete virtual computer system on top of your physical hardware: each VM has its own OS, applications, and files. VMs consume the resources they reserve but can run an alternate OS. For example, it is possible to run a Windows VM on a Linux machine with 12 cores and 24GB of memory by allocating some of the cores and memory: 4 cores and 8GB. When the VM is started, the machine’s hypervisor (where Windows runs) has exclusive access to those 4 cores and 8GB, leaving 8 cores and 16GB to the Linux Host OS. Devices can be passthrough directly to the VM’s Guest OS, for example, a webcam, but here, too, it becomes only available to the Guest OS until the VM is shut down. At this point, all reserved resources become part of the Host OS again. Because VMs run a whole operating system, they are resource-intensive, take longer to start up, and require more disk space and memory than containers. VMs are an infrastructure solution where a Guest OS runs various applications within a pre-defined set of system resources. They are a good choice when running applications requiring a specific operating system or complete isolation from others, which can benefit security-sensitive applications.
Docker uses a different approach called containerization: it shares the host operating system's kernel but runs in isolated user space. Containers package an application and its dependencies, ensuring it runs consistently across different environments. Their sharing of the host kernel makes them much faster to start and more efficient with resources. Most container technologies are Linux-based and do not reserve resources because they share the Host OS kernel. As such, where our Linux host machine has 12 cores and 24GB of memory, the container also sees those (unless restricted to a subset of those resources at runtime). Docker runs processes isolated from the host’s main processes. Each container has its own namespace. Suppose processes A and B run on the kernel and C is started within a container. In that case, the container will only see process C, while the host kernel will see A, B, and C. Because of their low impact on resources, many containers can be run on a single machine, where few VMs could run. Containers can also be run within a Linux VM. In fact, on Windows and macOS hosts, a lightweight Linux VM is started to create a running Linux kernel for containers’ processes. Containers are application solutions that maximize application running using minimal additional infrastructure. They are ideal for deploying and scaling microservices applications, packaging applications, and their dependencies into a container that makes it easy to deploy and run consistently across different environments.

Docker Concepts

Docker provides a way to package applications into standardized units called containers that hold everything an application needs to run: the code, its runtime environment, system tools, libraries, and settings.
Docker is an open-source containerization platform that allows users to package applications and their dependencies into portable, self-contained units called “containers.” A container is a lightweight, standalone, executable package that includes everything needed to run an application - code, runtime, system tools, libraries, and settings. Containers are isolated from each other and the host system, providing a consistent and reproducible environment for applications to run. The Docker architecture primer offers a good insight into how to get started.
Docker Key Components
  1. Docker Engine: The core of Docker, responsible for building, running, and distributing containers.
  1. Docker Images: Read-only templates used to create containers. Images are built from a set of instructions called a Dockerfile.
  1. Docker Containers: Running instances of Docker images containing the application and its dependencies.
  1. Docker Registry: A repository for storing and distributing Docker images, like Docker Hub.
Docker Architecture: Docker uses a client-server architecture
  1. Docker Client: The command-line interface (CLI) that communicates with the Docker daemon.
  1. Docker Daemon: The server component that manages Docker objects (images, containers, networks, volumes).
  1. Docker Registry: A service for hosting and distributing Docker images.

Images vs Containers

An image is the base/blueprint for a Docker container. A read-only "set of layers" (more on that soon) defines the container's contents. You can docker build your own images using Dockerfiles or download pre-built ones from public repositories like Docker Hub and quay.io.
When building your own, you will rely on a FROM (a base image) that specifies the operating system (OS) type and version (e.g., Ubuntu Linux 24.04). You will detail in that Dockerfile any application code/binaries, libraries, and dependencies the application needs and configuration files (or external environment variables) required to run the "container.”
A container is a running instance of a Docker image. It's an isolated process on your system with its own file system, network interface, and resource limits (if specified, for CPU or Memory). You can run multiple containers from the same image.
Think of an image as a Lego set with all the bricks and instructions to build a specific model (the application). A container is the actual assembled Lego model (the running application).
Docker images are built using layers --the Union File System: allowing Docker to overlay the layers virtually, presenting a unified view of the container's filesystem such that the container sees all the files from each layer as if they exist in a single location-- where each layer represents a specific instruction or step executed during the image creation process, adding something new to the filesystem. The final image is the combination of all these layers stacked on each other. For more details, see https://docs.docker.com/build/guide/layers/
This proves helpful in build caching. Because layers are cached, Docker only rebuilds layers that have changed since the last build, which saves time, especially for complex images with many dependencies.

Docker usage

Installation

There are multiple ways to get started with a Docker installation.
  • Docker Desktop provides a UI for the container runtime and simplifies the visualization of resource usage for existing containers and their images.
  • On macOS and Linux, you can install the docker Command Line Interface (CLI) using HomeBrew by running brew install docker

Hello World

If you have a sudo-less docker ready, you can run docker run hello-world to get some details about your installation and the host it is running on.
"hello-world" is the traditional first step when starting with Docker: it tests whether your Docker installation is working correctly and demonstrates downloading an image and running a container.
Docker first checks your local system for a downloaded image named "hello-world." Since it doesn't find one, it pulls the image from the "Docker Hub" public repository. Once the image is available locally, docker will create a container based on the image and execute the code within the image—here, the simple "hello-world.” After that, the container ends as its purpose is fulfilled.

Dockerfile

Let's say we have a simple Dockerfile as follows
FROM alpine CMD ["echo", "simple echo output"]
This will pull a Linux Alpine image and run an echo "command." This image will also come from Docker Hub, specifically the https://hub.docker.com/_/alpine/ image registry. Alpine is a "minimal Docker image based on Alpine Linux with a complete package index and only 5 MB in size."
Because we are not specifying a tag (docker images are referenced using the image:tag format), it defaults to latest (if one is available)
To use this Dockerfile, we need to build it to obtain an image.
docker build --tag test:0.1 .
We build an image and tag it with a name:version, we also tell the build command where to find the Dockerfile (herein ., the current directory)
Now that we have an image, we can run it
docker run test:0.1
, which displays the line from the echo command.
Using docker images will show us our test:0.1 image, while docker container ls will show us a list of running containers.
Since our image exited, you will need to docker container ls -a to see it; it will be in an exited status, and you will notice its name was randomly assigned.
docker run --rm -it --name test_container test:0.1 /bin/sh
The above command does many things:
  • --rm we ask for the container to be deleted after we exit
  • -it we are asking for an interactive terminal
  • --name test_container we are asking for it to be named test_container
  • we are specifying the command to run within the container as /bin/sh
We now have an interactive container running within that alpine image, you can ls within it. When you exit, this time when listing the container, you will find it gone.
To delete our first attempt, we can use either the container's name or its ID (the first 4 digits should be sufficient) to delete it (adapt the value with your value)
docker container rm ed35

Mounting a directory or file

To access a file (or directory) within a docker run command, use -v for docker run to mount a volume into your Docker container.
docker run -v <host_path>:<container_path> <image_name>
Any data created within a Docker container is lost when the container is removed. Volumes allow data to be stored outside the container, ensuring it's not deleted when it stops. By default, docker is run as the root user, so data written within those mounted directories will be owned by that user (on Mac, the mount will be done as the user starting the docker command).
It is also possible to create docker volume that performs better than mounting directories or files within the running container. Those “named volumes” are managed by Docker and have their own lifecycle.

Exposing ports

If you want to expose a port for a service to be accessible outside of the docker network, you will need to:
  • Add an EXPOSE line within the Dockerfile to specify the internal port within the container that will answer requests for the given service.
  • run the docker command with -p <host_port>:<container_port> listing the ports that the service will be available on the container’s host and the port exposed within the container.
 
Based on the last two sections and adapting an example from CTPO
docker run --rm \ -v `pwd`:/iti \ -p 8765:8888 \ infotrend/ctpo-jupyter-tensorflow_pytorch_opencv
  • We are mounting the current directory as /iti, which is the directory used by Jupyter Labs to store its files
  • We are exposing Jupyter’s port 8888 as 8765 on the host system. By going to http://localhost:8765 you will be shown the Jupyter Log in page

Further testing

  • A fellow self-hoster’s content to “Learn Docker basics for efficient self-hosting. Explore setup, benefits, and practical steps for seamless application deployment in this comprehensive guide.” https://nerdyarticles.com/docker-101/

Revision History

  • 20240602-0: Added Henning’s “Docker 101” link
  • 20240527-0: Initial public release
  • 20240524-0: Ubuntu 24.04 update and expansion
  • 20240404-0: Initial writeup (Ubuntu 22.04 version)
 
Â