Docker Volume

  • What’s under the hood?

Docker leverage Union filesystems which is a logical filesystem by grouping different directories and filesystems together. Each filesystem is made available as a branch, which becomes a separate layer. Docker images are based on a union filesystem, where each branch represents a new layer. It allows images to be constructed and deconstructed as needed instead of creating a large, monolithic image.

Docker uses the copy-on-write capabilities of the available union filesystem to add a writeable “working directory” or temporary filesystem on top of the existing read-only layers. When Docker first starts a container, this initial read-write layer is empty until changes are made to the file system by the running container process. When a Docker image is created from an existing container, only the changes made which have all been “copied up” to this writeable working directory are added into the new layer. This approach enables reuse of images without duplication or fragmentation.

When a process attempts to write to an existing file, the filesystem implementing the copy-on-write feature creates a copy of the file in the top most working layer. All other processes using the original image’s layers will continue to access the read-only, original version of the layer. This technique optimizes both image disk space usage and the performance of container start times.

  • Let’s talk about using Persistent Storage volume

By default all files created inside a container are stored on a writable container layer. This means that:

  • The data doesn’t persist when that container no longer exists, and it can be difficult to get the data out of the container if another process needs it.
  • A container’s writable layer is tightly coupled to the host machine where the container is running. You can’t easily move the data somewhere else.
  • Writing into a container’s writable layer requires a storage driver to manage the filesystem. The storage driver provides a union filesystem, using the Linux kernel. This extra abstraction reduces performance as compared to using data volumes, which write directly to the host filesystem.
  • Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts. If you’re running Docker on Linux you can also use a tmpfs mount.

  • What are the methods to ensure I have a persitent storage?

There are three methods. Lets talk about each of them.

Host-Based Persistence

In host-based persistence, multiple containers can share one or more volumes. In a scenario where multiple containers are writing to a single shared volume, it can cause data corruption. Data volumes are directly accessible from the Docker host. This means you can read and write to them with normal Linux tools. In most cases, you should not do this, as it can cause data corruption if your containers and applications are unaware of your direct access.

There are three ways of using host-based persistence, with subtle differences in the way they are implemented.

Implicit Per-Container Storage

The first mechanism will create an implicit storage sandbox for the container that requested host-based persistence. The directory is created by default at /var/lib/docker/volumes on the host during the creation of the container. When the running container is removed, the directory is automatically deleted on the host by the Docker Engine.

Kernel

Explicit Shared Storage (Data Volumes)

if there is a need to share data across multiple containers running on the same host. In this scenario, an explicit location on the host filesystem is exposed as a mount within one or more containers. This becomes especially useful when multiple containers need read-write access to the same directory.

This technique is the most popular one used by DevOps teams. Referred to as data volumes in Docker, it offers the following benefits:

  • Data volumes can be shared and reused across multiple containers.
  • Changes made to a data volume are made directly, bypassing the engine’s storage backend image layers implementation.
  • Changes applied to a data volume will not be included when the image gets updated.
  • Data volumes are available even if the container itself is deleted.

Kernel

Shared Multi-Host Storage

Customers deploying containerized workloads in production often run them in a clustered environment, where multiple hosts participate to deliver required compute, network and storage capabilities. This scenario demands distributed storage and shared file system such as Ceph, GlusterFS, NFS that is made available to all hosts and is then exposed to the containers through a consistent namespace.

Kernel

Lets get our hands dirty

  • How to create a volume ?
$ docker container run -i -t -v /datavol --name datavolume alpine sh
# cd /datavol; touch mysamplefile; ls

N.B: we can use -v option with ‘docker container run command. Like in the example above using -v /datavol option with docker container run command, which would create a directory on the host system and mount it under “/datavol” of the container.

We created a file “mysamplefile” inside this volume. Now exit from the container and lets inspect the volume.

$ docker container inspect datavolume | grep Mounts

N.B : The above command will show you the source volume on the host system and the destination volume /datavol inside the container.

  • Lets get into the source folder and see whats there inside.
$ cd /var/lib/docker/volumes/<VOLUME_ID/_datavol
# ls 

N.B : The above should show you the file we have created earlier.

  • How to list volumes ?
$ docker volume ls
  • How to create a named volume ?
$ docker volume create --name mystoragevolume
  • How do i see the above volume created ?
$ docker volume ls 
  • How to mount a the above volume created ?
$ docker container run -i -t -v mystoragevolume:/data --name myvolume alpine sh

N.B : Create a file inside and see from the source volume, you can see the contents.

  • How to remove a volume ?
$ docker container rm -f -v mystoragevolume
  • How to mount a host directory inside a container ?
$ mkdir /mnt/shared
$ echo "Hello my first persistent storage demo" > /mnt/shared/storage.txt
$ docker container run -it -v /mnt/shared:/data alpine sh
# cd /data
# cat storage.txt
  • How to mount a volume in read-only mode ?
$ docker container run -it -v /mnt/shared:/data:ro alpine sh
# cd /data
# echo "hello" > storage.txt

N.B: You will get a messsage saying its a read-only FS

  • How to know about docker disk usage ?
$ docker system df
  • How to remove all unused volumes ?
$ docker volume prune

Scenario

  • Create a volume named mysql-data
  • Create a MySQL DB (stateful) Docker Container and make it use the persistent storage volume we just created above. So that, MySQL will store its DB and files in this volume. MySQL stores its data at the /var/lib/mysql folder. Run it in daemon mode
  • Now login to the container and check to see all files
  • Now stop and the above mysql container
  • Now create a new MySQL Docker Container and make it use the same storage volume where MySQL DB and files already exists.
  • Check everything is running as expected