李沐动手学深度学习V2

文章内容说明

本文主要是自己学习过程中的随手笔记，需要自取

课程参考B站：https://space.bilibili.com/1567748478?spm_id_from=333.788.0.0

课件等信息原视频简介中有

线性回归从零实现

导入包

%matplotlib inline

import random

import torch

from d2l import torch as d2l

生成人造数据集

def synthetic_data(w, b, num_examples):  #@save

    """生成y=Xw+b+噪声"""

    X = torch.normal(0, 1, (num_examples, len(w)))

    y = torch.matmul(X, w) + b

    y += torch.normal(0, 0.01, y.shape)

    return X, y.reshape((-1, 1))

true_w = torch.tensor([2, -3.4])

true_b = 4.2

features, labels = synthetic_data(true_w, true_b, 1000)

print('features:', features[0],'\nlabel:', labels[0])

d2l.set_figsize()

d2l.plt.scatter(features[:, (1)].detach().numpy(), labels.detach().numpy(), 1);ython

结果如下：

读取数据集

定义一个data_iter函数，该函数接收批量大小、特征矩阵和标签向量作为输入，生成大小为batch_size的小批量

def data_iter(batch_size, features, labels):

    num_examples = len(features)

    indices = list(range(num_examples))

    # 这些样本是随机读取的，没有特定的顺序

    random.shuffle(indices)

    for i in range(0, num_examples, batch_size):

        batch_indices = torch.tensor(

            indices[i: min(i + batch_size, num_examples)])

        yield features[batch_indices], labels[batch_indices]

batch_size = 10

for X, y in data_iter(batch_size, features, labels):

    print(X, '\n', y)

    break

结果如下：

定义模型

定义模型，将模型的输入和参数同模型的输出关联起来

w = torch.normal(0, 0.01, size=(2,1), requires_grad=True)

b = torch.zeros(1, requires_grad=True)

def linreg(X, w, b):  #@save

    """线性回归模型"""

    return torch.matmul(X, w) + b

定义损失函数

def squared_loss(y_hat, y):  #@save

    """均方损失"""

    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

定义优化算法

def sgd(params, lr, batch_size):  #@save

    """小批量随机梯度下降"""

    with torch.no_grad():

        for param in params:

            param -= lr * param.grad / batch_size

            param.grad.zero_()

训练

lr = 0.03

num_epochs = 3

net = linreg

loss = squared_loss

for epoch in range(num_epochs):

    for X, y in data_iter(batch_size, features, labels):

        l = loss(net(X, w, b), y)  # X和y的小批量损失

        # 因为l形状是(batch_size,1)，而不是一个标量。l中的所有元素被加到一起，

        # 并以此计算关于[w,b]的梯度

        l.sum().backward()

        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数

    with torch.no_grad():

        train_l = loss(net(features, w, b), labels)

        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

结果如下：

因为我们使用的是自己合成的数据集，所以我们知道真正的参数是什么。因此，我们可以通过比较真实参数和通过训练学到的参数来评估训练的成功程度。事实上，真实参数和通过训练学到的参数确实非常接近。

print(f'w的估计误差: {true_w - w.reshape(true_w.shape)}')

print(f'b的估计误差: {true_b - b}')

结果如下：

线性回归简介实现

生成数据集

import numpy as np

import torch

from torch.utils import data

from d2l import torch as d2l 

true_w = torch.tensor([2, -3.4])

true_b = 4.2

features, labels = d2l.synthetic_data(true_w, true_b, 1000)

读取数据集

调用框架中现有API读取数据

def load_array(data_arrays, batch_size, is_train=True):  #@save

    """构造一个PyTorch数据迭代器"""

    dataset = data.TensorDataset(*data_arrays)

    return data.DataLoader(dataset, batch_size, shuffle=is_train)

batch_size = 10

data_iter = load_array((features, labels), batch_size)

next(iter(data_iter))

结果如下：

定义模型

使用框架的预定义好的层

# nn是神经网络的缩写

from torch import nn

net = nn.Sequential(nn.Linear(2, 1))

net[0].weight.data.normal_(0, 0.01)

net[0].bias.data.fill_(0)

结果如下：

定义损失函数

计算均方误差使用的是MSELoss类，也称为平方L2范数。默认情况下，它返回所有样本损失的平均值。

loss = nn.MSELoss()

定义优化算法

trainer = torch.optim.SGD(net.parameters(), lr=0.03)

训练

num_epochs = 3

for epoch in range(num_epochs):

    for X, y in data_iter:

        l = loss(net(X) ,y)

        trainer.zero_grad()

        l.backward()

        trainer.step()

    l = loss(net(features), labels)

    print(f'epoch {epoch + 1}, loss {l:f}')

结果如下：

比较生成数据集的真实参数和通过有限数据训练获得的模型参数

w = net[0].weight.data

print('w的估计误差：', true_w - w.reshape(true_w.shape))

b = net[0].bias.data

print('b的估计误差：', true_b - b)

结果如下：

Softmax回归（分类问题）

从回归到多类分类

--均方损失

--无校验比例

--校验比例

交叉熵损失

总结一下：

1.Softmax回归是一个多类分类模型

2.使用Softmax操作得到每个类的预测置信度

3.使用交叉熵来衡量预测和标号的区别

3个常用的损失函数

L2 loss（均方损失）

蓝色：y=0，变换y'

绿色：似然函数，e的﹣L次方

橙色：梯度

红箭头：指梯度大小

L1 loss（绝对值损失函数）

Huber's Robust loss（结合上面两个）

图像分类数据集的使用

(MNIST数据集) :cite:LeCun.Bottou.Bengio.ea.1998 (是图像分类中广泛使用的数据集之一，但作为基准数据集过于简单。我们将使用类似但更复杂的Fashion-MNIST数据集) :cite:Xiao.Rasul.Vollgraf.2017。

引入库

%matplotlib inline

import torch

import torchvision

from torch.utils import data

from torchvision import transforms

from d2l import torch as d2l

d2l.use_svg_display()

读取数据集

通过框架中的内置函数（torchvision.datasets.xxx）将Fashion-MNIST数据集下载并读取到内存中

# 通过ToTensor实例将图像数据从PIL类型变换成32位浮点数格式，

# 并除以255使得所有像素的数值均在0～1之间

trans = transforms.ToTensor()

mnist_train = torchvision.datasets.FashionMNIST(

    root="../data", train=True, transform=trans, download=True)

mnist_test = torchvision.datasets.FashionMNIST(

    root="../data", train=False, transform=trans, download=True)

len(mnist_train), len(mnist_test)

# 结果：(60000, 10000)

mnist_train[0][0].shape

# 结果：torch.Size([1, 28, 28])

# 每个输入图像的高度和宽度均为28像素。 数据集由灰度图像组成，其通道数为1。

可视化数据集函数

Fashion-MNIST中包含的10个类别，分别为t-shirt（T恤）、trouser（裤子）、pullover（套衫）、dress（连衣裙）、coat（外套）、sandal（凉鞋）、shirt（衬衫）、sneaker（运动鞋）、bag（包）和ankle boot（短靴）。以下函数用于在数字标签索引及其文本名称之间进行转换。

# 获取标签

def get_fashion_mnist_labels(labels):  #@save

    """返回Fashion-MNIST数据集的文本标签"""

    text_labels = ['t-shirt', 'trouser', 'pullover', 'dress', 'coat',

                   'sandal', 'shirt', 'sneaker', 'bag', 'ankle boot']

    return [text_labels[int(i)] for i in labels]

# 绘制图像

def show_images(imgs, num_rows, num_cols, titles=None, scale=1.5):  #@save

    """绘制图像列表"""

    figsize = (num_cols * scale, num_rows * scale)

    _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize)

    axes = axes.flatten()

    for i, (ax, img) in enumerate(zip(axes, imgs)):

        if torch.is_tensor(img):

            # 图片张量

            ax.imshow(img.numpy())

        else:

            # PIL图片

            ax.imshow(img)

        ax.axes.get_xaxis().set_visible(False)

        ax.axes.get_yaxis().set_visible(False)

        if titles:

            ax.set_title(titles[i])

    return axes

以下是训练数据集中前几个样本的图像及其相应的标签。

X, y = next(iter(data.DataLoader(mnist_train, batch_size=18)))

show_images(X.reshape(18, 28, 28), 2, 9, titles=get_fashion_mnist_labels(y));

结果如下：

读取小批量

使用内置的数据迭代器，而不是从零开始创建。在每次迭代中，数据加载器每次都会读取一小批量数据，大小为batch_size。

batch_size = 256

def get_dataloader_workers():  #@save

    """使用4个进程来读取数据"""

    return 4

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,

                             num_workers=get_dataloader_workers())

查看训练数据所需时间，读取数据速度要比这个快一些

timer = d2l.Timer()

for X, y in train_iter:

    continue

f'{timer.stop():.2f} sec'

整合所有组件

def load_data_fashion_mnist(batch_size, resize=None):  #@save

    """下载Fashion-MNIST数据集，然后将其加载到内存中"""

    trans = [transforms.ToTensor()]

    if resize:

        trans.insert(0, transforms.Resize(resize))

    trans = transforms.Compose(trans)

    mnist_train = torchvision.datasets.FashionMNIST(

        root="../data", train=True, transform=trans, download=True)

    mnist_test = torchvision.datasets.FashionMNIST(

        root="../data", train=False, transform=trans, download=True)

    return (data.DataLoader(mnist_train, batch_size, shuffle=True,

                            num_workers=get_dataloader_workers()),

            data.DataLoader(mnist_test, batch_size, shuffle=False,

                            num_workers=get_dataloader_workers()))

通过指定resize参数测试函数图像调整大小功能

train_iter, test_iter = load_data_fashion_mnist(32, resize=64)

for X, y in train_iter:

    print(X.shape, X.dtype, y.shape, y.dtype)

    break

结果如下

Softmax回归从零开始实现

导入包和设置迭代器批量大小

import torch

from IPython import display

from d2l import torch as d2l

batch_size = 256

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

初始化模型参数

将展平每个图像，把它们看作长度为784的向量。因为我们的数据集有10个类别，所以网络输出维度为10

num_inputs = 784

num_outputs = 10

W = torch.normal(0, 0.01, size=(num_inputs, num_outputs), requires_grad=True)

b = torch.zeros(num_outputs, requires_grad=True)

定义Softmax操作

给定一个矩阵X，我们可以对所有元素求和

第一个参数是第几维求和

X = torch.tensor([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]])

X.sum(0, keepdim=True), X.sum(1, keepdim=True)

结果如下

实现Softmax

实现Softmax的三个步骤

1.对每个项求幂（使用exp）；

2.对每一行求和（小批量中每个样本是一行），得到每个样本的规范化常数；

3.将每一行除以其规范化常数，确保结果的和为1。

def softmax(X):

    X_exp = torch.exp(X)

    partition = X_exp.sum(1, keepdim=True)

    return X_exp / partition  # 这里应用了广播机制

我们将每个元素变成一个非负数。此外，依据概率原理，每行总和为1

X = torch.normal(0, 1, (2, 5))

X_prob = softmax(X)

X_prob, X_prob.sum(1)

定义模型

def net(X):

    return softmax(torch.matmul(X.reshape((-1, W.shape[0])), W) + b)

定义损失函数

创建一个数据样本y_hat，其中包含2个样本在3个类别的预测概率，以及它们对应的标签y。使用y作为y_hat中概率的索引

y = torch.tensor([0, 2])

y_hat = torch.tensor([[0.1, 0.3, 0.6], [0.3, 0.2, 0.5]])

y_hat[[0, 1], y] # 对于第0个样本拿出y0的概率，第1个样本拿出y1的概率

结果如下

实现交叉熵损失函数

def cross_entropy(y_hat, y):

    return - torch.log(y_hat[range(len(y_hat)), y])

cross_entropy(y_hat, y)

结果如下

分类精度

将预测类别与真实y元素进行比较

def accuracy(y_hat, y):  #@save

    """计算预测正确的数量"""

    if len(y_hat.shape) > 1 and y_hat.shape[1] > 1:

        y_hat = y_hat.argmax(axis=1)

    cmp = y_hat.type(y.dtype) == y

    return float(cmp.type(y.dtype).sum())

accuracy(y_hat, y) / len(y)

结果如下

我们可以评估在任意模型net的精度

def evaluate_accuracy(net, data_iter):  #@save

    """计算在指定数据集上模型的精度"""

    if isinstance(net, torch.nn.Module):

        net.eval()  # 将模型设置为评估模式

    metric = Accumulator(2)  # 正确预测数、预测总数

    with torch.no_grad():

        for X, y in data_iter:

            metric.add(accuracy(net(X), y), y.numel())

    return metric[0] / metric[1]

# Accumulator实例中创建了2个变量， 分别用于存储正确预测的数量和预测的总数量

class Accumulator:  #@save

    """在n个变量上累加"""

    def __init__(self, n):

        self.data = [0.0] * n

    def add(self, *args):

        self.data = [a + float(b) for a, b in zip(self.data, args)]

    def reset(self):

        self.data = [0.0] * len(self.data)

    def __getitem__(self, idx):

        return self.data[idx]

evaluate_accuracy(net, test_iter)

结果如下

softmax回归的训练

def train_epoch_ch3(net, train_iter, loss, updater):  #@save

    """训练模型一个迭代周期（定义见第3章）"""

    # 将模型设置为训练模式

    if isinstance(net, torch.nn.Module):

        net.train()

    # 训练损失总和、训练准确度总和、样本数

    metric = Accumulator(3)

    for X, y in train_iter:

        # 计算梯度并更新参数

        y_hat = net(X)

        l = loss(y_hat, y)

        if isinstance(updater, torch.optim.Optimizer):

            # 使用PyTorch内置的优化器和损失函数

            updater.zero_grad()

            l.mean().backward()

            updater.step()

        else:

            # 使用定制的优化器和损失函数

            l.sum().backward()

            updater(X.shape[0])

        metric.add(float(l.sum()), accuracy(y_hat, y), y.numel())

    # 返回训练损失和训练精度

    return metric[0] / metric[2], metric[1] / metric[2]

# 定义一个在动画中绘制数据的实用程序类 Animator

class Animator:  #@save

    """在动画中绘制数据"""

    def __init__(self, xlabel=None, ylabel=None, legend=None, xlim=None,

                 ylim=None, xscale='linear', yscale='linear',

                 fmts=('-', 'm--', 'g-.', 'r:'), nrows=1, ncols=1,

                 figsize=(3.5, 2.5)):

        # 增量地绘制多条线

        if legend is None:

            legend = []

        d2l.use_svg_display()

        self.fig, self.axes = d2l.plt.subplots(nrows, ncols, figsize=figsize)

        if nrows * ncols == 1:

            self.axes = [self.axes, ]

        # 使用lambda函数捕获参数

        self.config_axes = lambda: d2l.set_axes(

            self.axes[0], xlabel, ylabel, xlim, ylim, xscale, yscale, legend)

        self.X, self.Y, self.fmts = None, None, fmts

    def add(self, x, y):

        # 向图表中添加多个数据点

        if not hasattr(y, "__len__"):

            y = [y]

        n = len(y)

        if not hasattr(x, "__len__"):

            x = [x] * n

        if not self.X:

            self.X = [[] for _ in range(n)]

        if not self.Y:

            self.Y = [[] for _ in range(n)]

        for i, (a, b) in enumerate(zip(x, y)):

            if a is not None and b is not None:

                self.X[i].append(a)

                self.Y[i].append(b)

        self.axes[0].cla()

        for x, y, fmt in zip(self.X, self.Y, self.fmts):

            self.axes[0].plot(x, y, fmt)

        self.config_axes()

        display.display(self.fig)

        display.clear_output(wait=True)

# 实现训练函数

def train_ch3(net, train_iter, test_iter, loss, num_epochs, updater):  #@save

    """训练模型（定义见第3章）"""

    animator = Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0.3, 0.9],

                        legend=['train loss', 'train acc', 'test acc'])

    for epoch in range(num_epochs):

        train_metrics = train_epoch_ch3(net, train_iter, loss, updater)

        test_acc = evaluate_accuracy(net, test_iter)

        animator.add(epoch + 1, train_metrics + (test_acc,))

    train_loss, train_acc = train_metrics

    assert train_loss < 0.5, train_loss

    assert train_acc <= 1 and train_acc > 0.7, train_acc

    assert test_acc <= 1 and test_acc > 0.7, test_acc

# 使用 :numref:sec_linear_scratch中定义的 [小批量随机梯度下降来优化模型的损失函数]，设置学习率为0.1。

lr = 0.1

def updater(batch_size):

    return d2l.sgd([W, b], lr, batch_size)

# 训练模型10个迭代周期

num_epochs = 10

train_ch3(net, train_iter, test_iter, cross_entropy, num_epochs, updater)

结果如下

对图像进行分类预测

def predict_ch3(net, test_iter, n=6):  #@save

    """预测标签（定义见第3章）"""

    for X, y in test_iter:

        break

    trues = d2l.get_fashion_mnist_labels(y)

    preds = d2l.get_fashion_mnist_labels(net(X).argmax(axis=1))

    titles = [true +'\n' + pred for true, pred in zip(trues, preds)]

    d2l.show_images(

        X[0:n].reshape((n, 28, 28)), 1, n, titles=titles[0:n])

predict_ch3(net, test_iter)

结果如下

Softmax的简洁实现

import torch

from torch import nn

from d2l import torch as d2l

batch_size = 256

train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)

# 初始化模型参数

# PyTorch不会隐式地调整输入的形状。因此，

# 我们在线性层前定义了展平层（flatten），来调整网络输入的形状

net = nn.Sequential(nn.Flatten(), nn.Linear(784, 10))

def init_weights(m):

    if type(m) == nn.Linear:

        nn.init.normal_(m.weight, std=0.01)

net.apply(init_weights);

# 在交叉熵损失函数中传递未规范化的预测，并同时计算softmax及其对数

loss = nn.CrossEntropyLoss(reduction='none')

# 使用学习率为0.1的小批量随机梯度下降作为优化算法

trainer = torch.optim.SGD(net.parameters(), lr=0.1)

# 调用之前定义的训练函数来训练模型

num_epochs = 10

d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, trainer)

结果如下