How the docker container creation process works (from docker run to runc)


Over the past few months I’ve been investing a good bit of personal time studying how Linux containers work. Specifically, what does docker run actually do. In this post I’m going to walk through what I’ve observed and try to demystify how all the pieces fit togther. To start our adventure I’m going to create an alpine container with docker run:

$ docker run -i -t --name alpine alpine ash

This container will be used in the output below. When the docker run command is invoked it parses the options passed on the command line and creates a JSON object to represent the object it wants docker to create. The object is then sent to the docker daemon through the /var/run/docker.sock UNIX domain socket. We can use the strace utility to observe the API calls:

$ strace -s  -e trace=read,write -f docker run -d alpine

[pid ] write(, "GET /_ping HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/1.13.1 (linux)\r\n\r\n", ) =
[pid ] read(, "HTTP/1.1 200 OK\r\nApi-Version: 1.26\r\nDocker-Experimental: false\r\nServer: Docker/1.13.1 (linux)\r\nDate: Mon, 19 Feb 2018 16:12:32 GMT\r\nContent-Length: 2\r\nContent-Type: text/plain; charset=utf-8\r\n\r\nOK", ) =
[pid ] write(, "POST /v1.26/containers/create HTTP/1.1\r\nHost: docker\r\nUser-Agent: Docker-Client/1.13.1 (linux)\r\nContent-Length: 1404\r\nContent-Type: application/json\r\n\r\n{\"Hostname\":\"\",\"Domainname\":\"\",\"User\":\"\",\"AttachStdin\":false,\"AttachStdout\":false,\"AttachStderr\":false,\"Tty\":false,\"OpenStdin\":false,\"StdinOnce\":false,\"Env\":[],\"Cmd\":null,\"Image\":\"alpine\",\"Volumes\":{},\"WorkingDir\":\"\",\"Entrypoint\":null,\"OnBuild\":null,\"Labels\":{},\"HostConfig\":{\"Binds\":null,\"ContainerIDFile\":\"\",\"LogConfig\":{\"Type\":\"\",\"Config\":{}},\"NetworkMode\":\"default\",\"PortBindings\":{},\"RestartPolicy\":{\"Name\":\"no\",\"MaximumRetryCount\":0},\"AutoRemove\":false,\"VolumeDriver\":\"\",\"VolumesFrom\":null,\"CapAdd\":null,\"CapDrop\":null,\"Dns\":[],\"DnsOptions\":[],\"DnsSearch\":[],\"ExtraHosts\":null,\"GroupAdd\":null,\"IpcMode\":\"\",\"Cgroup\":\"\",\"Links\":null,\"OomScoreAdj\":0,\"PidMode\":\"\",\"Privileged\":false,\"PublishAllPorts\":false,\"ReadonlyRootfs\":false,\"SecurityOpt\":null,\"UTSMode\":\"\",\"UsernsMode\":\"\",\"ShmSize\":0,\"ConsoleSize\":[0,0],\"Isolation\":\"\",\"CpuShares\":0,\"Memory\":0,\"NanoCpus\":0,\"CgroupParent\":\"\",\"BlkioWeight\":0,\"BlkioWeightDevice\":null,\"BlkioDeviceReadBps\":null,\"BlkioDeviceWriteBps\":null,\"BlkioDeviceReadIOps\":null,\"BlkioDeviceWriteIOps\":null,\"CpuPeriod\":0,\"CpuQuota\":0,\"CpuRealtimePeriod\":0,\"CpuRealtimeRuntime\":0,\"CpusetCpus\":\"\",\"CpusetMems\":\"\",\"Devices\":[],\"DiskQuota\":0,\"KernelMemory\":0,\"MemoryReservation\":0,\"MemorySwap\":0,\"MemorySwappiness\":-1,\"OomKillDisable\":false,\"PidsLimit\":0,\"Ulimits\":null,\"CpuCount\":0,\"CpuPercent\":0,\"IOMaximumIOps\":0,\"IOMaximumBandwidth\":0},\"NetworkingConfig\":{\"EndpointsConfig\":{}}}\n", ) =
[pid ] read(, "HTTP/1.1 201 Created\r\nApi-Version: 1.26\r\nContent-Type: application/json\r\nDocker-Experimental: false\r\nServer: Docker/1.13.1 (linux)\r\nDate: Mon, 19 Feb 2018 16:12:32 GMT\r\nContent-Length: 90\r\n\r\n{\"Id\":\"b70b57c5ae3e25585edba898ac860e388582391907be4070f91eb49f4db5c433\",\"Warnings\":null}\n", ) =

Now here is were the real fun begins. Once the docker daemon receives the request it will parse the output and contact containerd via the gRPC API to set up the container runtime using the options passed on the command line. We can use the ctr utility to observe this interaction:

Setting up the container runtime is a pretty substantial undertaking. Namespaces need to be configured, the Image needs to be mounted, security controls (app armor profiles, seccomp profiles, capabilities) need to be enabled, etc , etc. You can get a pretty good idea of everything that is required to set up the runtime by reviewing the output of docker inspect containerid and the config.json runtime specification file (more on that in a moment).

Containerd doesn’t actually create the container runtime. It sets up the environment and then invokes containerd-shim to start the container runtime via the configured OCI runtime (controlled with the containerd “–runtime” option) . For most modern systems the container runtime is based on runc. We can see this first hand with the pstree utility:

$ pstree -l -p -s -T

systemd, --switched-root --system --deserialize
├─docker-containe, --listen unix:///run/containerd.sock --shim /usr/libexec/docker/docker-containerd-shim-current --start-timeout 2m --debug
│ ├─docker-containe, 93a619715426f613646359863e77cc06fa85502273df931517ec3f4aaae50d5a /var/run/docker/libcontainerd/93a619715426f613646359863e77cc06fa85502273df931517ec3f4aaae50d5a /usr/libexec/docker/docker-runc-current

Since pstree truncates the process name we can verify the PIDs with ps:

$ ps auxwww | grep []

root       0.0  0.2   ?        Ssl  :   : /usr/libexec/docker/docker-containerd-current --listen unix:///run/containerd.sock --shim /usr/libexec/docker/docker-containerd-shim-current --start-timeout 2m --debug

$ ps auxwww | grep []

root       0.0  0.0    ?        Sl   :   : /usr/libexec/docker/docker-containerd-shim-current 93a619715426f613646359863e77cc06fa85502273df931517ec3f4aaae50d5a /var/run/docker/libcontainerd/93a619715426f613646359863e77cc06fa85502273df931517ec3f4aaae50d5a /usr/libexec/docker/docker-runc-current

When I first started researching the interaction between dockerd, containerd and the shim I wasn’t real sure what purpose the shim served. Luckily Google took me to a great write up by Michael Crosby. The shim serves a couple of purposes:

  1. It allows you to run daemonless containers.
  2. STDIO and other FDs are kept open in the event that containerd and docker die.
  3. Reports the containers exit status to containerd.

The first and second bullet points are super important. These features allows the container to be decoupled from the docker daemon allowing dockerd to be upgraded or restarted w/o impacting the running containers. Nifty! I mentioned that the shim is responsible for kicking off runc to actually run the container. Runc needs two things to do its job: a specification file and a path to a root file system image (the combination of the two is referred to as a bundle). To see how this works we can create a rootfs by exporting the alpine docker image:

$ mkdir -p alpine/rootfs

$ cd alpine

$ docker export d1a6d87886e2 | tar -C rootfs -xvf -

time="2018-02-19T12:54:13.082321231-05:00" level=debug msg="Calling GET /v1.26/containers/d1a6d87886e2/export"
.dockerenv
bin/
bin/ash
bin/base64
bin/bbconfig
.....

The export option takes a container if which you can find in the docker ps -a output. To generate a specificationfile you can use the runc spec command:

$ runc spec

This will create a specification file named config.json in your current directory. This file can be customized to suit your needs and requirements. Once you are happy with the file you can run runc with the rootfs directory as its sole argument (the container configuration will be read from the file config.json file):

$ runc run rootfs

This simple example will spawn an alpine ash shell:

$ runc run rootfs

/ # cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.7.
PRETTY_NAME="Alpine Linux v3.7"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"

Being able to create containers and play with the runc runtime specification is incredibly powerful. You can evaluate different apparmor profiles, test out Linux capabilities and play around with every facet of the container runtime environment without needing to install docker. I just barely scratched the surface here and would highly recommend reading through the runc and containerd documentation. Super cool stuff!

转载自:https://prefetch.net/blog/2018/02/19/how-the-docker-container-creation-process-works-from-docker-run-to-runc/

如何从底层调试docker的更多相关文章

  1. 如何解决Visual Studio 首次调试 docker 的 vs2017u5 exists, deleting Opening stream failed, trying again with proxy settings

    前言 因为之前我电脑安装的是windows10家庭版,然而windows10家庭没有Hyper-v功能. 搜索了几篇windows10家庭版安装docker相关的博客,了解一些前辈们走过的坑. 很多人 ...

  2. pycharm远程调试docker容器内程序

    文章链接: https://blog.csdn.net/hanchaobiao/article/details/84069299 参考链接: https://blog.csdn.net/github_ ...

  3. dotnet core调试docker下生成的dump文件

    最近公司预生产环境.net core应用的docker容器经常出现内存暴涨现象,有时会突然吃掉几个G,触发监控预警,造成容器重启. 分析了各种可能原因,修复了可能发生的内存泄露,经测试本地正常,但是发 ...

  4. macOS 下 PHPStorm + Xdebug 调试 Docker 环境中的代码

    0x00 描述 宿主机是 mac mini,构建的项目在 docker 中,所以需要在 PHPStorm 上配置 Xdebug 进行远程代码调试. 0x01 环境 宿主机:macOS High Sie ...

  5. phpStorm中使用xdebug工具调试docker容器中的程序

    前提准备 phpstorm开发软件 + dnmp(docker + nginx + mysql +php) 配置好hosts 映射比如 /etc/hosts      127.0.0.1 tp5.de ...

  6. 远程调试docker构建的weblogic

    环境信息 OSType: CentOS Linux 7 (Core) x86_64 3.10.0-957.21.3.el7.x86_64 DockerVersion: 19.03.8 Mirrors: ...

  7. 保姆级教程:VsCode调试docker中的NodeJS程序

    最近在写NodeJS相关的项目,运行在docker容器中,也是想研究一下断点调试,于是查阅相关资料,最终顺利配置好了. 首先我选择了VsCode作为ide,并用VsCode来做NodeJS可视化deb ...

  8. 使用 pycharm调试docker环境运行的Odoo

    2019日 星期一 安装docker windows系统,参考 docker官方文档 Mac系统,参考 docker官方文档 构建自定义ODOO镜像 标准ODOO镜像可能不包含特别的python模块, ...

  9. 如何调试 Docker

    开启 Debug 模式 在 dockerd 配置文件 daemon.json(默认位于 /etc/docker/)中添加 { "debug": true } 重启守护进程. $ s ...

随机推荐

  1. Notadd 2.0 全新 Node.js 版本~ (开发中) [从 PHP 到 node 的踩坑记]

    对于 Notadd 我们本来期望它实现更多... 尽管我们也尝试做了很多努力,但是由于 PHP 本身的局限,以及考虑到开发环境配置的复杂程度,最终使用了折中方案.接下来,我们谈谈整个技术选型历程,也供 ...

  2. NOIP2017赛前模拟10月30日总结

    题目1: n个人参赛(n<=100000),每个人有一个权值··已知两个人权值绝对值之差小于等于K时,两个人都有可能赢,若大于则权值大的人赢···比赛为淘汰制,进行n-1轮·问最后可能赢的人有多 ...

  3. 如何通过IP地址分辨公网、私网、内网、外网

    如何通过IP地址分辨公网.私网.内网.外网   内.外网是相对于防火墙而言的,在防火墙内部叫做内网,反之就是外网.   在一定程度上外网等同于公网,内网等同于私网.   地址为如下3个区域就是处于私网 ...

  4. php函数注释

    https://segmentfault.com/q/1010000003087072 phpstorm /**+回车

  5. linux命令 显示文件内容

    通过命令+文件名查看内容.如下命令可以查看.1, cat :由第一行开始显示文件内容:2,tac:从最后一行开始显示,可以看出tac与cat字母顺序相反:3,nl:显示的时候输出行号:4,more:一 ...

  6. kubernetes-dashboard

    1.导入kubernetes-dashboard 镜像 [root@node1 DNS]# docker load < kube-dashboard.tar 6bc90c4dba69: Load ...

  7. 开发人员为组件添加自定义的className

    在开发过程当中需要给组件写上自己的样式,这个时候怎么办呢? 直接给组件添加className这样是无效的 当给组件添加className之后 在写组件的时候需要对使用你的组件的开发人员提供自定义cla ...

  8. 弱题(bzoj 2510)

    Description 有M个球,一开始每个球均有一个初始标号,标号范围为1-N且为整数,标号为i的球有ai个,并保证Σai = M. 每次操作等概率取出一个球(即取出每个球的概率均为1/M),若这个 ...

  9. 回文串(bzoj 3676)

    Description 考虑一个只包含小写拉丁字母的字符串s.我们定义s的一个子串t的“出 现值”为t在s中的出现次数乘以t的长度.请你求出s的所有回文子串中的最 大出现值. Input 输入只有一行 ...

  10. IHTMLDocument2的所有成员、属性、方法、事件[转]

    原文发布时间为:2010-07-01 -- 来源于本人的百度文章 [由搬家工具导入] IHTMLDocument2 InterfaceGets information about the docume ...