[笔记] 基于nvidia/cuda的深度学习基础镜像构建流程 V0.2

之前的[笔记] 基于nvidia/cuda的深度学习基础镜像构建流程已经Out了，以这篇为准。

基于NVidia官方的nvidia/cuda image，构建适用于Deep Learning的基础image。

思路就是先把常用的东西都塞进去，build成image，此后使用时想装哪个框架就装。

为了体验重装系统的乐趣，所以采用慢慢来比较快的步骤，而不是通过Dockerfile来build。

环境信息

已经安装了Docker CE和NVIDIA Container Toolkit，具体流程参考这里。

Host OS: Ubuntu 18.04 64 bit

CUDA: 10.0

cuDNN: 7.4

Docker CE: 19.03.0

镜像信息

可以在nvidia/cuda查看提供的镜像列表，不同tag的区别是：

base: 基于CUDA，包含最精简的依赖，用于部署预编译的CUDA应用，需要手工安装所需的其他依赖。
runtime: 基于base，添加了CUDA toolkit共享的库
devel: 基于runtime，添加了编译工具链，调试工具，头文件，静态库。用于从源码编译CUDA应用。

为了省事，这里选择nvidia/cuda:10.0-cudnn7-devel。

$ sudo docker pull nvidia/cuda:10.0-cudnn7-devel

目前拉取到的镜像信息如下：

OS: Ubuntu 18.04.2 LTS

Size: 3.09 GB

启动镜像

创建目录base，方便数据导入导出，映射为容器内的/host目录，然后在这个目录下运行命令创建容器dl-base。

$ sudo docker run -it --gpus all -P --name dl-base -v `pwd`:/host nvidia/cuda:10.0-cudnn7-devel

一切顺利的话，出现类似下面的命令行：

root@d6421dac4cec:/#

可以运行nvidia-smi验证容器内的CUDA环境正常。

替换阿里源

大陆的网络环境下，阿里源速度还不错。

将下面的内容存为base目录下的sources.list文件。

deb http://mirrors.aliyun.com/ubuntu/ bionic main restricted

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates main restricted

deb http://mirrors.aliyun.com/ubuntu/ bionic universe

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates universe

deb http://mirrors.aliyun.com/ubuntu/ bionic multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-updates multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-backports main restricted universe multiverse

deb http://mirrors.aliyun.com/ubuntu/ bionic-security main restricted

deb http://mirrors.aliyun.com/ubuntu/ bionic-security universe

deb http://mirrors.aliyun.com/ubuntu/ bionic-security multiverse

在容器命令行下运行命令更新源。

$ cp /host/sources.list /etc/apt/sources.list

$ apt update

安装基本工具

$ apt install -y vim curl git iputils-ping net-tools telnet tmux unzip

创建工作及下载目录

$ mkdir -p /work/download

修改`~/.bashrc`

在文件尾部添加下面内容：

alias u='cd ..'

alias ins='apt install -y'

alias ta='tmux a -t'

alias jn='jupyter notebook --ip=0.0.0.0 --allow-root'

再使其生效：

$ source ~/.bashrc

安装`openssh-server`

$ apt install -y openssh-server

修改/etc/ssh/sshd_config，找到#PermitRootLogin开头的这一行，修改为PermitRootLogin yes，这样就可以通过root登录了。

然后修改密码：

$ passwd

两次输入密码，然后重启ssh：

$ /etc/init.d/ssh restart

出现下面内容就OK了。

 * Restarting OpenBSD Secure Shell server sshd [ OK ]

这里为了简单粗暴，采用了root来登陆。

如果考虑安全，可自行创建用户，并对ssh进行配置。

安装miniconda

也可按需安装anaconda等python包，这里以miniconda为例。

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

$ sh Miniconda3-latest-Linux-x86_64.sh

安装时问是否初始化，选择yes:

Do you wish the installer to initialize Miniconda3 by running conda init? [yes|no]

再使其生效：

$ source ~/.bashrc

配置pip源为阿里源:

$ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple

配置conda源为清华源:

$ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

$ conda config --set show_channel_urls yes

安装常用的包：

$ conda install jupyter numpy matplotlib Pillow scipy pandas opencv

这里安装的opencv是3.4.2

保存镜像

至此，就可以在Host下运行命令将容器保存为镜像了。

$ sudo docker commit dl-base dl/base

这样就生成了一个镜像dl/base，大小为6.51GB。

在此镜像基础上，可以自行安装不同的框架。

jupyter notebook

如果希望在容器中启动jupyter notebook，需要加上参数如下：

$ jupyter notebook --ip=0.0.0.0 --allow-root

这个已经加入alias了。