Docker leverage Union filesystems which is a logical filesystem by grouping different directories and filesystems together. Each filesystem is made available as a branch, which becomes a separate layer. Docker images are based on a union filesystem, where each branch represents a new layer. It allows images to be constructed and deconstructed as needed instead of creating a large, monolithic image.
Docker uses the copy-on-write capabilities of the available union filesystem to add a writeable “working directory” or temporary filesystem on top of the existing read-only layers. When Docker first starts a container, this initial read-write layer is empty until changes are made to the file system by the running container process. When a Docker image is created from an existing container, only the changes made which have all been “copied up” to this writeable working directory are added into the new layer. This approach enables reuse of images without duplication or fragmentation.
When a process attempts to write to an existing file, the filesystem implementing the copy-on-write feature creates a copy of the file in the top most working layer. All other processes using the original image’s layers will continue to access the read-only, original version of the layer. This technique optimizes both image disk space usage and the performance of container start times.
By default all files created inside a container are stored on a writable container layer. This means that:
Docker has two options for containers to store files in the host machine, so that the files are persisted even after the container stops: volumes, and bind mounts. If you’re running Docker on Linux you can also use a tmpfs mount.
What are the methods to ensure I have a persitent storage?
There are three methods. Lets talk about each of them.
In host-based persistence, multiple containers can share one or more volumes. In a scenario where multiple containers are writing to a single shared volume, it can cause data corruption. Data volumes are directly accessible from the Docker host. This means you can read and write to them with normal Linux tools. In most cases, you should not do this, as it can cause data corruption if your containers and applications are unaware of your direct access.
There are three ways of using host-based persistence, with subtle differences in the way they are implemented.
The first mechanism will create an implicit storage sandbox for the container that requested host-based persistence. The directory is created by default at /var/lib/docker/volumes on the host during the creation of the container. When the running container is removed, the directory is automatically deleted on the host by the Docker Engine.
if there is a need to share data across multiple containers running on the same host. In this scenario, an explicit location on the host filesystem is exposed as a mount within one or more containers. This becomes especially useful when multiple containers need read-write access to the same directory.
This technique is the most popular one used by DevOps teams. Referred to as data volumes in Docker, it offers the following benefits:
Customers deploying containerized workloads in production often run them in a clustered environment, where multiple hosts participate to deliver required compute, network and storage capabilities. This scenario demands distributed storage and shared file system such as Ceph, GlusterFS, NFS that is made available to all hosts and is then exposed to the containers through a consistent namespace.
$ docker container run -i -t -v /datavol --name datavolume alpine sh
# cd /datavol; touch mysamplefile; ls
N.B: we can use -v option with ‘docker container run command. Like in the example above using -v /datavol option with docker container run command, which would create a directory on the host system and mount it under “/datavol” of the container.
We created a file “mysamplefile” inside this volume. Now exit from the container and lets inspect the volume.
$ docker container inspect datavolume | grep Mounts
N.B : The above command will show you the source volume on the host system and the destination volume /datavol inside the container.
$ cd /var/lib/docker/volumes/<VOLUME_ID/_datavol
# ls
N.B : The above should show you the file we have created earlier.
$ docker volume ls
$ docker volume create --name mystoragevolume
$ docker volume ls
$ docker container run -i -t -v mystoragevolume:/data --name myvolume alpine sh
N.B : Create a file inside and see from the source volume, you can see the contents.
$ docker container rm -f -v mystoragevolume
$ mkdir /mnt/shared
$ echo "Hello my first persistent storage demo" > /mnt/shared/storage.txt
$ docker container run -it -v /mnt/shared:/data alpine sh
# cd /data
# cat storage.txt
$ docker container run -it -v /mnt/shared:/data:ro alpine sh
# cd /data
# echo "hello" > storage.txt
N.B: You will get a messsage saying its a read-only FS
$ docker system df
$ docker volume prune