本文首发于个人博客https://kezunlin.me/post/6b505d27/，欢迎阅读最新内容！

full guide tutorial to install and configure deep learning environments on linux server

Quick Guide

prepare

tools

MobaXterm (for windows)
ssh + vscode

for windows:

drop files to MobaXterm to upload to server

use zip format

commands

view disk

du -d 1 -h

df -h

gpu and cpu usage

watch -n 1 nvidia-smi

top

view files and count

wc -l data.csv

# count how many folders

ls -lR | grep '^d' | wc -l

17

# count how many jpg files

ls -lR | grep '.jpg' | wc -l

1360

# view 10 images

ls train | head

ls test | head

link datasets

# link

ln -s srt dest

ln -s /data_1/kezunlin/datasets/ dl4cv/datasets

scp

scp -r node17:~/dl4cv  ~/git/

scp -r node17:~/.keras ~/

tmux for background tasks

tmux new -s notebook

tmux ls

tmux attach -t notebook

tmux detach

wget download

# wget

# continue donwload

wget -c url 

# background donwload for large file

wget -b -c url

tail -f wget-log

# kill background wget

pkill -9 wget

tips about training large model

terminal 1:

tmux new -s train

conda activate keras

time python train_alexnet.py

terminal 2:

tmux detach

tmux attach -t train

and then close vscode, otherwise bash training process will exit when we close vscode.

cuda driver and toolkits

see cuda-toolkit for cuda driver version

cudatookit version depends on cuda driver version.

install nvidia-drivers

sudo add-apt-repository ppa:graphics-drivers/ppa

sudp apt-get update

sudo apt-cache search nvidia-*

# nvidia-384

# nvidia-396

sudo apt-get -y install nvidia-418

# test

nvidia-smi

Failed to initialize NVML: Driver/library version mismatch

reboot to test again

https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch

install cuda-toolkit(dirvers)

remove all previous nvidia drivers

sudo apt-get -y pruge nvidia-*

go to here and download cuda_10.1

wget -b -c http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.243_418.87.00_linux.run

sudo sh cuda_10.1.243_418.87.00_linux.run

sudo ./cuda_10.1.243_418.87.00_linux.run

vim .bashrc

# for cuda and cudnn

export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

check cuda driver version

> cat /proc/driver/nvidia/version

NVRM version: NVIDIA UNIX x86_64 Kernel Module  418.87.00  Thu Aug  8 15:35:46 CDT 2019

GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11) 

>nvidia-smi

Tue Aug 27 17:36:35 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |

|-------------------------------+----------------------+----------------------+

> nvidia-smi -L

GPU 0: Quadro RTX 8000 (UUID: GPU-acb01c1b-776d-cafb-ea35-430b3580d123)

GPU 1: Quadro RTX 8000 (UUID: GPU-df7f0fb8-1541-c9ce-e0f8-e92bccabf0ef)

GPU 2: Quadro RTX 8000 (UUID: GPU-67024023-20fd-a522-dcda-261063332731)

GPU 3: Quadro RTX 8000 (UUID: GPU-7f9d6a27-01ec-4ae5-0370-f0c356327913)

> nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation

Built on Sun_Jul_28_19:07:16_PDT_2019

Cuda compilation tools, release 10.1, V10.1.243

install conda

./Anaconda3-2019.03-Linux-x86_64.sh

[yes]

[yes]

config channels

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/menpo/

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

conda config --set show_channel_urls yes

install libraries

conclusions:

py37/keras: conda install -y tensorflow-gpu keras==2.2.5
py37/torch: conda install -y pytorch torchvision
py36/mxnet: conda install -y mxnet

keras 2.2.5 was released on 2019/8/23.

Add new Applications: ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.

common libraries

conda install -y scikit-learn scikit-image pandas matplotlib pillow opencv seaborn

pip install imutils progressbar pydot pylint

pip install imutils to avoid downgrade for tensorflow-gpu

py37

cudatoolkit               10.0.130                  0

cudnn                     7.6.0                cuda10.0_0

tensorflow-gpu            1.13.1

py36

cudatoolkit        anaconda/pkgs/main/linux-64::cudatoolkit-10.1.168-0

cudnn              anaconda/pkgs/main/linux-64::cudnn-7.6.0-cuda10.1_0

tensorboard        anaconda/pkgs/main/linux-64::tensorboard-1.14.0-py36hf484d3e_0

tensorflow         anaconda/pkgs/main/linux-64::tensorflow-1.14.0-gpu_py36h3fb9ad6_0

tensorflow-base    anaconda/pkgs/main/linux-64::tensorflow-base-1.14.0-gpu_py36he45bfe2_0

tensorflow-estima~ anaconda/cloud/conda-forge/linux-64::tensorflow-estimator-1.14.0-py36h5ca1d4c_0

tensorflow-gpu     anaconda/pkgs/main/linux-64::tensorflow-gpu-1.14.0-h0d30ee6_0

imutils only support 36 and 37.

mxnet only support 35 and 36.

details

# remove py35

conda remove -n py35 --all

conda info --envs

conda create -n py37 python==3.7

conda activate py37

# common libraries

conda install -y scikit-learn pandas pillow opencv

pip install imutils

# imutils

conda search imutils

# py36 and py37

# Name                       Version           Build  Channel

imutils                        0.5.2          py27_0  anaconda/cloud/conda-forge

imutils                        0.5.2          py36_0  anaconda/cloud/conda-forge

imutils                        0.5.2          py37_0  anaconda/cloud/conda-forge

# tensorflow-gpu and keras

conda install -y tensorflow-gpu keras

# install pytorch

conda install -y pytorch torchvision

# install mxnet

# method 1: pip

pip search mxnet

mxnet-cu80[mkl]/mxnet-cu90[mkl]/mxnet-cu91[mkl]/mxnet-cu92[mkl]/mxnet-cu100[mkl]/mxnet-cu101[mkl]

# method 2: conda

conda install mxnet

# py35 and py36

TensorFlow Object Detection API

home page: home page

download tensorflow models and rename models-master to tfmodels

vim ~/.bashrc

export PYTHONPATH=/home/kezunlin/dl4cv:/data_1/kezunlin/tfmodels/research:$PYTHONPATH

source ~/.bashrc

jupyter notebook

conda activate py37

conda install -y jupyter

install kernels

python -m ipykernel install --user --name=py37

Installed kernelspec py37 in /home/kezunlin/.local/share/jupyter/kernels/py37

config for server

python -c "import IPython;print(IPython.lib.passwd())"

Enter password:

Verify password:

sha1:ef2fb2aacff2:4ea2998699638e58d10d594664bd87f9c3381c04

jupyter notebook --generate-config

Writing default config to: /home/kezunlin/.jupyter/jupyter_notebook_config.py

vim .jupyter/jupyter_notebook_config.py

c.NotebookApp.ip = '*'

c.NotebookApp.password = u'sha1:xxx:xxx'

c.NotebookApp.open_browser = False

c.NotebookApp.port = 8888

c.NotebookApp.enable_mathjax = True

run jupyter on background

tmux new -s notebook

jupyter notebook

# ctlr+b+d exit session and DO NOT close session

# ctlr+d exit session and close session

access web and input password

test

py37

import cv2

cv2.__version

import tensorflow as tf

import keras

import torch

import torchvision

cat .keras/keras.json

{

    "epsilon": 1e-07,

    "floatx": "float32",

    "backend": "tensorflow",

    "image_data_format": "channels_last"

}

py36

import mxnet

train demo

export

# use CPU only

export CUDA_VISIBLE_DEVICES=""

# use gpu 0 1

export CUDA_VISIBLE_DEVICES="0,1"

code

import os

os.environ['CUDA_VISIBLE_DEVICES'] = "0,1"

start train

python train.py

./keras folder

view keras models and datasets

ls .keras/

datasets  keras.json  models

models saved to /home/kezunlin/.keras/models/

datasets saved to /home/kezunlin/.keras/datasets/

models lists

xxx_kernels_notop.h5 for include_top = False

xxx_kernels.h5 for include_top = True

Datasets

mnist

cifar10

to skip download

wget http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz

mv ~/Download/cifar-10-python.tar.gz ~/.keras/datasets/cifar-10-batches-py.tar.gz

to load data

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

flowers-17

animals

cat dog panda

panda images are WRONG !!!

counts

ls -lR animals/cat | grep ".jpg" | wc -l

1000

ls -lR animals/dog | grep ".jpg" | wc -l

1000

ls -lR animals/panda | grep ".jpg" | wc -l

1000

kaggle cats vs dogs

dogs-vs-cats

caltech101

caltech101

download background

wget -b -c http://www.vision.caltech.edu/Image_Datasets/Caltech101/101_ObjectCategories.tar.gz

Kaggle API

install and config

see kaggle-api

conda activate keras

conda install kaggle

# download kaggle.json

mv kaggle.json ~/.kaggle/kaggle.json

chmod 600 ~/.kaggle/kaggle.json

cat kaggle.json

{"username":"xxx","key":"yyy"}

or by export

export KAGGLE_USERNAME=xxx

export KAGGLE_KEY=yyy

tips

go to account and select 'Create API Token' and keras.json will be downloaded.

Ensure kaggle.json is in the location ~/.kaggle/kaggle.json to use the API.

check version

kaggle --version

Kaggle API 1.5.5

commands overview

commands

kaggle competitions {list, files, download, submit, submissions, leaderboard}

kaggle datasets {list, files, download, create, version, init}

kaggle kernels {list, init, push, pull, output, status}

kaggle config {view, set, unset}

download datasets

kaggle competitions download -c dogs-vs-cats

show leaderboard

kaggle competitions leaderboard dogs-vs-cats --show

teamId  teamName                           submissionDate       score

------  ---------------------------------  -------------------  -------

71046  Pierre Sermanet                    2014-02-01 21:43:19  0.98533

66623  Maxim Milakov                      2014-02-01 18:20:58  0.98293

72059  Owen                               2014-02-01 17:04:40  0.97973

74563  Paul Covington                     2014-02-01 23:05:20  0.97946

74298  we've been in KAIST                2014-02-01 21:15:30  0.97840

71949  orchid                             2014-02-01 23:52:30  0.97733

set default competition

kaggle config set --name competition --value dogs-vs-cats

- competition is now set to: dogs-vs-cats

kaggle config set --name competition --value dogs-vs-cats-redux-kernels-edition

dogs-vs-cats

dogs-vs-cats-redux-kernels-edition

submit

kaggle c submissions

- Using competition: dogs-vs-cats

- No submissions found

kaggle c submit -f ./submission.csv -m "first submit"

competition has already ended, so can not submit.

Nvidia-docker and containers

install

sudo apt-get -y install docker

# Install nvidia-docker2 and reload the Docker daemon configuration

sudo apt-get install -y nvidia-docker2

sudo pkill -SIGHUP dockerd

restart (optional)

cat /etc/docker/daemon.json

{

    "runtimes": {

        "nvidia": {

            "path": "nvidia-container-runtime",

            "runtimeArgs": []

        }

    }

}

sudo systemctl enable docker

sudo systemctl start docker

if errors occur:

Job for docker.service failed because the control process exited with error code.

See "systemctl status docker.service" and "journalctl -xe" for details.

check /etc/docker/daemon.json

test

sudo docker run --runtime=nvidia --rm nvidia/cuda:10.1-base nvidia-smi

sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi

Thu Aug 29 00:11:32 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Quadro RTX 8000     Off  | 00000000:02:00.0 Off |                  Off |

| 43%   67C    P2   136W / 260W |  46629MiB / 48571MiB |     17%      Default |

+-------------------------------+----------------------+----------------------+

|   1  Quadro RTX 8000     Off  | 00000000:03:00.0 Off |                  Off |

| 34%   54C    P0    74W / 260W |      0MiB / 48571MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   2  Quadro RTX 8000     Off  | 00000000:82:00.0 Off |                  Off |

| 34%   49C    P0    73W / 260W |      0MiB / 48571MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   3  Quadro RTX 8000     Off  | 00000000:83:00.0 Off |                  Off |

| 33%   50C    P0    73W / 260W |      0MiB / 48571MiB |      3%      Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

+-----------------------------------------------------------------------------+

add user to docker group, and no need to use sudo docker xxx

command refs

sudo nvidia-docker run --rm nvidia/cuda:10.1-base nvidia-smi

sudo nvidia-docker -t -i --privileged nvidia/cuda bash

sudo docker run -it --name kzl -v /home/kezunlin/workspace/:/home/kezunlin/workspace nvidia/cuda

Reference

History

20190821: created.

Copyright

Post author: kezunlin
Post link: https://kezunlin.me/post/6b505d27/
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

linux服务器上配置进行kaggle比赛的深度学习tensorflow keras环境详细教程的更多相关文章

在Linux服务器上配置phpMyAdmin
使用php和mysql开发网站的话,phpmyadmin是一个非常友好的mysql管理工具,并且免费开源,国内很多虚拟主机都自带这样的管理工具,配置很简单,接下来在linux服务器上配置phpmyad ...
本地Linux服务器上配置Git
当我们需要拉取远程服务器代码到本地服务器时,我们首先要确定已经配置了正确的Git账号,可以从~/.gitconfig文件(为隐藏文件,需要使用ls -a查看),以及~/.ssh下的id_rsa.pub ...
linux服务器上配置多个svn仓库
linux服务器上配置多个svn仓库 1.在指定目录建立仓库保存总目录,本文示例目录设定为:/usr/local/svn/svnrepos # mkdir -p /usr/local/svn/svnr ...
在linux服务器上配置anaconda和Tensorflow,并运行
1. 查看服务器上的Python安装路径: whereis python 2. 查看安装的Python版本号: python 3. 安装Anaconda: 1)下载 Anaconda2-4.0.0-L ...
深度学习Tensorflow生产环境部署（上·环境准备篇）
最近在研究Tensorflow Serving生产环境部署,尤其是在做服务器GPU环境部署时,遇到了不少坑.特意总结一下,当做前车之鉴. 1 系统背景系统是ubuntu16.04 ubuntu@ub ...
在Linux服务器上配置Transmission来离线下载BT种子
Transmission简介 Transmission是一种BitTorrent客户端,特点是跨平台的后端和简洁的用户界面,硬件资源消耗极少,支持包括Linux.BSD.Solaris.Mac OS ...
Linux服务器上配置2个Tomcat或者多个Tomcat
一.当在一个服务器上面安装2个tomcat的时候,修改第二个tomcat的conf目录下server.xml文件里面的端口号(原8080改成8081,原8005改成8006)可以达到两个tomcat都 ...
[亲测]ASP.NET Core 2.0怎么发布/部署到Ubuntu Linux服务器并配置Nginx反向代理实现域名访问
前言 ASP.NET Core 2.0 怎么发布到Ubuntu服务器?又如何在服务器上配置使用ASP.NET Core网站绑定到指定的域名,让外网用户可以访问呢? 步骤第1步:准备工作一台Liun ...
[亲测]七步学会ASP.NET Core 2.0怎么发布/部署到Ubuntu Linux服务器并配置Nginx反向代理实现域名访问
前言 ASP.NET Core 2.0 怎么发布到Ubuntu服务器?又如何在服务器上配置使用ASP.NET Core网站绑定到指定的域名,让外网用户可以访问呢? 步骤第1步:准备工作一台Liun ...

随机推荐

Linux系统学习十、DHCP服务器—介绍和原理
介绍: DHCP服务作用(动态主机配置协议) 为大量客户机自动分配地址.提供几种管理减轻管理和维护成本.提高网络配置效率可分配的地址信息主要包括: 网卡的IP地址.子网掩码对应的网路地址.广播地 ...
Vant-Weap通过事件获取值
van-field框的使用和通过事件获取值 <van-cell-group> <van-field value="{{username}}" label=&qu ...
《2019年小米春季上海 PHP 实习生招聘面试题》部分答案解析
1 丶 Nginx 怎么实现负载均衡这个还是比较简单 1.轮询这种是默认的策略,把每个请求按顺序逐一分配到不同的 server,如果 server 挂掉,能自动剔除. 2.最少连接把请求分配到连 ...
dev c++必须修改的三处默认设置
此文档记录参加pat考试并且以dev c++[针对5.11版本]软件作为开发工具时,必须修改的三个默认设置. 1.修改默认语言标准 Dev C++ 5.11 版本支持较新的 C 和 C++ 标准,但是 ...
Xmind软件——xmind 8 pro下载激活推荐！！
亲测有效,在csdn上找到一个. 下载激活xmind 8 pro链接
Python 爬虫从入门到进阶之路（二）
上一篇文章我们对爬虫有了一个初步认识,本篇文章我们开始学习 Python 爬虫实例. 在 Python 中有很多库可以用来抓取网页,其中内置了 urllib 模块,该模块就能实现我们基本的网页爬取. ...
R3环申请内存时页面保护与_MMVAD_FLAGS.Protection位的对应关系
Windows内核分析索引目录:https://www.cnblogs.com/onetrainee/p/11675224.html 技术学习来源:火哥(QQ:471194425) R3环申请内存时页 ...
Thinkphp 5.0.15 设计缺陷导致Insert/update-SQL注入分析
分析与上一个漏洞类似,这个也是前端可以传入一个数组变量,如['exp','123','123'],后端根据array[0]来将array[1]和array[2]直接拼接到SQL语句中. 由于TP只是 ...
JQuery iframe宽高度自适应浏览器窗口大小的解决方法
iframe宽高度自适应浏览器窗口大小的解决方法 by:授客 QQ:1033553122 1. 测试环境 JQuery-3.2.1.min.js 下载地址: https://gitee.com ...
win7下安装MySQL5.7教程
MySQL官网下载5.7 zip安装包地址:https://dev.mysql.com/downloads/mysql/5.7.html#downloads 1.解压mysql-5.7.14-wi ...

linux服务器上配置进行kaggle比赛的深度学习tensorflow keras环境详细教程

Quick Guide

prepare

tools

commands

cuda driver and toolkits

install nvidia-drivers

install cuda-toolkit(dirvers)

install conda

config channels

install libraries

py37

py36

details

TensorFlow Object Detection API

jupyter notebook

install kernels

config for server

run jupyter on background

test

py37

py36

train demo

export

code

./keras folder

Datasets

mnist

cifar10

flowers-17

animals

kaggle cats vs dogs

caltech101

Kaggle API

install and config

commands overview

download datasets

show leaderboard

set default competition

submit

Nvidia-docker and containers

install

restart (optional)

test

command refs

Reference

History

Copyright

linux服务器上配置进行kaggle比赛的深度学习tensorflow keras环境详细教程的更多相关文章

随机推荐

热门专题