写给程序员的机器学习入门 (二) - pytorch 与矩阵计算入门

pytorch 简介

pytorch 是目前世界上最流行的两个机器学习框架的其中之一，与 tensoflow 并峙双雄。它提供了很多方便的功能，例如根据损失自动微分计算应该怎样调整参数，提供了一系列的数学函数封装，还提供了一系列现成的模型，以及把模型组合起来进行训练的框架。pytorch 的前身是 torch，基于 lua，而 pytorch 基于 python，虽然它基于 python 但底层完全由 c++ 编写，支持自动并列化计算和使用 GPU 加速运算，所以它的性能非常好。

传统的机器学习有的会像前一节的例子中全部手写，或者利用 numpy 类库减少一部分工作量，也有人会利用 scikit-learn (基于 numpy) 类库封装好的各种经典算法。pytorch 与 tensorflow 和传统机器学习不一样的是，它们把重点放在了组建类似人脑的神经元网络 (Neural Network)，所以能实现传统机器学习无法做到的非常复杂的判断，例如判断图片中的物体类型，自动驾驶等。不过，它们组建的神经元网络工作方式是不是真的和人脑类似仍然有很多争议，目前已经有人开始着手组建原理上更接近人脑的 GNN (Graph Neural Network) 网络，但仍未实用化，所以我们这个系列还是会着重讲解当前已经实用化并广泛应用在各个行业的网络模型。

学 pytorch 还是学 tensorflow 好？

对初学者来说一个很常见的问题是，学 pytorch 还是学 tensorflow 好？按目前的统计数据来说，公司更多使用 tensorflow，而研究人员更多使用 pytorch，pytorch 的增长速度非常快，有超越 tensorflow 的趋势。我的意见是学哪个都无所谓，如果你熟悉 pytorch，学 tensorflow 也就一两天的事情，反过来也一样，并且 pytorch 和 tensorflow 的项目可以互相移植，选一个觉得好学的就可以了。因为我觉得 pytorch 更好学 (封装非常直观，使用 Dynamic Graph 使得调试非常容易)，所以这个系列会基于 pytorch 来讲。

Dynamic Graph 与 Static Graph

机器学习框架按运算的流程是否需要预先固定可以分为 Dynamic Graph 和 Static Graph，Dynamic Graph 不需要预先固定运算流程，而 Static Graph 需要。举例来说，对同一个公式 wx + b = y，Dynamic Graph 型的框架可以把 wx，+b 分开写并且逐步计算，计算的过程中随时都可以用 print 等指令输出途中的结果，或者把途中的结果发送到其他地方记录起来；而 Static Graph 型的框架必须预先定好整个计算流程，你只能传入 w, x, b 给计算器，然后让计算器输出 y，中途计算的结果只能使用专门的调试器来查看。

一般的来说 Static Graph 性能会比 Dynamic Graph 好，Tensorflow (老版本) 使用的是 Static Graph，而 pytorch 使用的是 Dynamic Graph，但两者实际性能相差很小，因为消耗资源的大部分都是矩阵运算，使用批次训练可以很大程度减少它们的差距。顺带一提，Tensorflow 1.7 开始支持了 Dynamic Graph，并且在 2.0 默认开启，但大部分人在使用 Tensorflow 的时候还是会用 Static Graph。

# Dynamic Graph 的印象，运算的每一步都可以插入自定义代码

def forward(w, x, b):

    wx = w * x

    print(wx)

    y = wx + b

    print(y)

    return y

forward(w, x, b)

# Static Graph 的印象，需要预先编译整个计算流程

forward = compile("wx+b")

forward(w, x, b)

安装 pytorch

假设你已经安装了 python3，执行以下命令即可安装 pytorch：

pip3 install pytorch

之后在 python 代码中使用 import torch 即可引用 pytorch 类库。

pytorch 的基本操作

接下来我们熟悉一下 pytorch 里面最基本的操作，pytorch 会用 torch.Tensor 类型来统一表现数值，向量 (一维数组) 或矩阵 (多维数组)，模型的参数也会使用这个类型。(tensorflow 会根据用途分为好几个类型，这点 pytorch 更简洁明了)

torch.Tensor 类型可以使用 torch.tensor 函数构建，以下是一些简单的例子（运行在 python 的 REPL 中):

# 引用 pytorch

>>> import torch

# 创建一个整数 tensor

>>> torch.tensor(1)

tensor(1)

# 创建一个小数 tensor

>>> torch.tensor(1.0)

tensor(1.)

# 单值 tensor 中的值可以用 item 函数取出

>>> torch.tensor(1.0).item()

1.0

# 使用一维数组创建一个向量 tensor

>>> torch.tensor([1.0, 2.0, 3.0])

tensor([1., 2., 3.])

# 使用二维数组创建一个矩阵 tensor

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]])

tensor([[ 1.,  2.,  3.],

        [-1., -2., -3.]])

tensor 对象的数值类型可以看它的 dtype 成员：

>>> torch.tensor(1).dtype

torch.int64

>>> torch.tensor(1.0).dtype

torch.float32

>>> torch.tensor([1.0, 2.0, 3.0]).dtype

torch.float32

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]).dtype

torch.float32

pytorch 支持整数类型 torch.uint8, torch.int8, torch.int16, torch.int32, torch.int64 ，浮点数类型 torch.float16, torch.float32, torch.float64，还有布尔值类型 torch.bool。类型后的数字代表它的位数 (bit 数)，而 uint8 前面的 u 代表它是无符号数 (unsigned)。实际绝大部分场景都只会使用 torch.float32，虽然精度没有 torch.float64 高但它占用内存小并且运算速度快。注意一个 tensor 对象里面只能保存一种类型的数值，不能混合存放。

创建 tensor 对象时可以通过 dtype 参数强制指定类型：

>>> torch.tensor(1, dtype=torch.int32)

tensor(1, dtype=torch.int32)

>>> torch.tensor([1.1, 2.9, 3.5], dtype=torch.int32)

tensor([1, 2, 3], dtype=torch.int32)

>>> torch.tensor(1, dtype=torch.int64)

tensor(1)

>>> torch.tensor(1, dtype=torch.float32)

tensor(1.)

>>> torch.tensor(1, dtype=torch.float64)

tensor(1., dtype=torch.float64)

>>> torch.tensor([1, 2, 3], dtype=torch.float64)

tensor([1., 2., 3.], dtype=torch.float64)

>>> torch.tensor([1, 2, 0], dtype=torch.bool)

tensor([ True,  True, False])

tensor 对象的形状可以看它的 shape 成员：

# 整数 tensor 的 shape 为空

>>> torch.tensor(1).shape

torch.Size([])

>>> torch.tensor(1.0).shape

torch.Size([])

# 数组 tensor 的 shape 只有一个值，代表数组的长度

>>> torch.tensor([1.0]).shape

torch.Size([1])

>>> torch.tensor([1.0, 2.0, 3.0]).shape

torch.Size([3])

# 矩阵 tensor 的 shape 根据它的维度而定，每个值代表各个维度的大小，这个例子代表矩阵有 2 行 3 列

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]).shape

torch.Size([2, 3])

tensor 对象与数值，tensor 对象与 tensor 对象之间可以进行运算：

>>> torch.tensor(1.0) * 2

tensor(2.)

>>> torch.tensor(1.0) * torch.tensor(2.0)

tensor(2.)

>>> torch.tensor(3.0) * torch.tensor(2.0)

tensor(6.)

向量和矩阵还可以批量进行运算（内部会并列化运算）：

# 向量和数值之间的运算

>>> torch.tensor([1.0, 2.0, 3.0])

tensor([1., 2., 3.])

>>> torch.tensor([1.0, 2.0, 3.0]) * 3

tensor([3., 6., 9.])

>>> torch.tensor([1.0, 2.0, 3.0]) * 3 - 1

tensor([2., 5., 8.])

# 矩阵和单值 tensor 对象之间的运算

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]])

tensor([[ 1.,  2.,  3.],

        [-1., -2., -3.]])

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]) / torch.tensor(2)

tensor([[ 0.5000,  1.0000,  1.5000],

        [-0.5000, -1.0000, -1.5000]])

# 矩阵和与矩阵最后一个维度相同长度向量之间的运算

>>> torch.tensor([[1.0, 2.0, 3.0], [-1.0, -2.0, -3.0]]) * torch.tensor([1.0, 1.5, 2.0])

tensor([[ 1.,  3.,  6.],

        [-1., -3., -6.]])

tensor 对象之间的运算一般都会生成一个新的 tensor 对象，如果你想避免生成新对象 (提高性能)，可以使用 _ 结尾的函数，它们会修改原有的对象：

# 生成新对象，原有对象不变，add 和 + 意义相同

>>> a = torch.tensor([1,2,3])

>>> b = torch.tensor([7,8,9])

>>> a.add(b)

tensor([ 8, 10, 12])

>>> a

tensor([1, 2, 3])

# 在原有对象上执行操作，避免生成新对象

>>> a.add_(b)

tensor([ 8, 10, 12])

>>> a

tensor([ 8, 10, 12])

pytorch 还提供了一系列方便的函数求最大值，最小值，平均值，标准差等:

>>> torch.tensor([1.0, 2.0, 3.0])

tensor([1., 2., 3.])

>>> torch.tensor([1.0, 2.0, 3.0]).min()

tensor(1.)

>>> torch.tensor([1.0, 2.0, 3.0]).max()

tensor(3.)

>>> torch.tensor([1.0, 2.0, 3.0]).mean()

tensor(2.)

>>> torch.tensor([1.0, 2.0, 3.0]).std()

tensor(1.)

pytorch 还支持比较 tensor 对象来生成布尔值类型的 tensor:

# tensor 对象与数值比较

>>> torch.tensor([1.0, 2.0, 3.0]) > 1.0

tensor([False,  True,  True])

>>> torch.tensor([1.0, 2.0, 3.0]) <= 2.0

tensor([ True,  True, False])

# tensor 对象与 tensor 对象比较

>>> torch.tensor([1.0, 2.0, 3.0]) > torch.tensor([1.1, 1.9, 3.0])

tensor([False,  True, False])

>>> torch.tensor([1.0, 2.0, 3.0]) <= torch.tensor([1.1, 1.9, 3.0])

tensor([ True, False,  True])

pytorch 还支持生成指定形状的 tensor 对象：

# 生成 2 行 3 列的矩阵 tensor，值全部为 0

>>> torch.zeros(2, 3)

tensor([[0., 0., 0.],

        [0., 0., 0.]])

# 生成 3 行 2 列的矩阵 tensor，值全部为 1

torch.ones(3, 2)

>>> torch.ones(2, 3)

tensor([[1., 1., 1.],

        [1., 1., 1.]])

# 生成 3 行 2 列的矩阵 tensor，值全部为 100

>>> torch.full((3, 2), 100)

tensor([[100., 100.],

        [100., 100.],

        [100., 100.]])

# 生成 3 行 3 列的矩阵 tensor，值为范围 [0, 1) 的随机浮点数

>>> torch.rand(3, 3)

tensor([[0.4012, 0.2412, 0.1532],

        [0.1178, 0.2319, 0.4056],

        [0.7879, 0.8318, 0.7452]])

# 生成 3 行 3 列的矩阵 tensor，值为范围 [1, 10] 的随机整数

>>> (torch.rand(3, 3) * 10 + 1).long()

tensor([[ 8,  1,  5],

        [ 8,  6,  5],

        [ 1,  6, 10]])

# 和上面的写法效果一样

>>> torch.randint(1, 11, (3, 3))

tensor([[7, 1, 3],

        [7, 9, 8],

        [4, 7, 3]])

这里提到的操作只是常用的一部分，如果你想了解更多 tensor 对象支持的操作，可以参考以下文档：

https://pytorch.org/docs/stable/tensors.html

pytorch 保存 tensor 使用的数据结构

为了减少内存占用与提升访问速度，pytorch 会使用一块连续的储存空间 (不管是在系统内存还是在 GPU 内存中) 保存 tensor，不管 tensor 是数值，向量还是矩阵。

我们可以使用 storage 查看 tensor 对象使用的储存空间：

# 数值的储存空间长度是 1

>>> torch.tensor(1).storage()

 1

[torch.LongStorage of size 1]

# 向量的储存空间长度等于向量的长度

>>> torch.tensor([1, 2, 3], dtype=torch.float32).storage()

 1.0

 2.0

 3.0

[torch.FloatStorage of size 3]

# 矩阵的储存空间长度等于所有维度相乘的结果，这里是 2 行 3 列总共 6 个元素

>>> torch.tensor([[1, 2, 3], [-1, -2, -3]], dtype=torch.float64).storage()

 1.0

 2.0

 3.0

 -1.0

 -2.0

 -3.0

[torch.DoubleStorage of size 6]

pytorch 会使用 stride 来确定一个 tensor 对象的维度：

# 储存空间有 6 个元素

>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

# 第一个维度是 2，第二个维度是 3 (2 行 3 列)

>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).shape

torch.Size([2, 3])

# stride 的意义是表示每个维度之间元素的距离

# 第一个维度会按 3 个元素来切分 (6 个元素可以切分成 2 组)，第二个维度会按 1 个元素来切分 (3 个元素)

>>> torch.tensor([[1, 2, 3], [-1, -2, -3]])

tensor([[ 1,  2,  3],

        [-1, -2, -3]])

>>> torch.tensor([[1, 2, 3], [-1, -2, -3]]).stride()

(3, 1)

pytorch 的一个很强大的地方是，通过 view 函数可以修改 tensor 对象的维度 (内部改变了 stride)，但是不需要创建新的储存空间并复制元素：

# 创建一个 2 行 3 列的矩阵

>>> a = torch.tensor([[1, 2, 3], [-1, -2, -3]])

>>> a

tensor([[ 1,  2,  3],

        [-1, -2, -3]])

>>> a.shape

torch.Size([2, 3])

>>> a.stride()

(3, 1)

# 把维度改为 3 行 2 列

>>> b = a.view(3, 2)

>>> b

tensor([[ 1,  2],

        [ 3, -1],

        [-2, -3]])

>>> b.shape

torch.Size([3, 2])

>>> b.stride()

(2, 1)

# 转换为向量

>>> c = b.view(6)

>>> c

tensor([ 1,  2,  3, -1, -2, -3])

>>> c.shape

torch.Size([6])

>>> c.stride()

(1,)

# 它们的储存空间是一样的

>>> a.storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

>>> b.storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

>>> c.storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

使用 stride 确定维度的另一个意义是它可以支持共用同一个空间实现转置 (Transpose) 操作:

# 创建一个 2 行 3 列的矩阵

>>> a = torch.tensor([[1, 2, 3], [-1, -2, -3]])

>>> a

tensor([[ 1,  2,  3],

        [-1, -2, -3]])

>>> a.shape

torch.Size([2, 3])

>>> a.stride()

(3, 1)

# 使用转置操作交换维度 (行转列)

>>> b = a.transpose(0, 1)

>>> b

tensor([[ 1, -1],

        [ 2, -2],

        [ 3, -3]])

>>> b.shape

torch.Size([3, 2])

>>> b.stride()

(1, 3)

# 它们的储存空间是一样的

>>> a.storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

>>> b.storage()

 1

 2

 3

 -1

 -2

 -3

[torch.LongStorage of size 6]

转置操作内部就是交换了指定维度在 stride 中对应的值，你可以根据前面的描述想想对象在转置后的矩阵中会如何划分。

现在再想想，如果把转置后的矩阵用 view 函数专为向量会变为什么？会变为 [1, -1, 2, -2, 3, -3] 吗？

实际上这样的操作会导致出错