Understanding Docker volumes
Docker data volumes are simply directories on the host, mounted read-write or read-only at specific paths inside the container. We can also say container “exposes” a path as the volume or that it binds the path with the volume. Some say container has a volume but that does sound a bit ambiguous.
Volumes allow container to persist data on the host, outside the union file system. Changes to files in the exposed volumes are therefore applied directly just as we would make them to the directory on the host.
Command “docker inspect <container>” shows all container paths mounted to volumes.
When creating a new container with “docker run -v …” Docker mounts a volume at a given path. We can select host directory as the new volume or let the Docker make it automatically in /var/lib/docker/volumes. If we let Docker create new volume, it will automatically remove it when all containers using it are removed.
New container can also reuse all volumes previously associated with some other container. Command “docker run –volumes-from <container> …” would mount all volumes associated with a given container inside a new container. This is useful for sharing data between containers where we let the Docker automatically manage where volumes are stored.
All containers are created from images. An image can provide a list of paths that should be mounted to volumes. We can see this list with “docker inspect <image>”. The output only shows paths, and these paths are not mounted to any volumes yet. When we do “docker run … <image>” to create a new container from the image we must associate each path with a volume. If we do not give any options, Docker would automatically make required volumes in /var/lib/docker/volumes. When we remove the container, that volumes (if they are not shared with other containers) would be deleted. Each time we create new container Docker will make new volume(s) and mount them within the container.
What may look strange is that when automatically creating a volume for a new container, Docker would copy any existing data from the image. For example if we expose /var/log and create a new container without mount target, Docker would create a new volume in /var/lib/docker/volumes with a copy of /var/log from the image. If we mount volumes manually, using an empty host directory for example, nothing will be copied from the image and we will simply override mount target with a given (empty) directory.
This behavior makes it very easy to expose any path from the image as automatically managed volume to get data persistence. The new volume will be initialized with whatever data is present in the image. But it may be confusing if we manually mount the same path to a host directory that does not contain required data skeleton from the image and get a process crashing because of that.
When mounting volumes users (uid/gid) may not match and ownership and permissions might be messed up. This is why it is usually recommended to have a data-only container (without a running process) exposing volumes and a container using those volumes, both sharing same users. That saying you do not run everything as root :).
3 Comments
Jul 1, 2014 Petar Marić
I didn't know you were into Docker as well, very nice analysis. Business, research, or hobby usage?
Jul 1, 2014 Goran
I would not call it business yet, but not a hobby either :)
Aug 24, 2016 Eric Hontz
Great post! Thank you! I've been seeing statements like, "the container exposes a volume at _ for data persistence," but didn't understand if I needed to specify a mounting directive when running. I also didn't know when an exposed volume was removed or that the volume would be blank if I specified a mount at the exposed path.