TensorFlow2入门与实践--CNN

卷积神经网络CNN

CNN原理

关于CNN的原理本文使用代码的方式来直观理解卷积中各个操作过程。

卷积

卷积层是CNN的核心组件，通过可学习的卷积核在输入特征图上进行滑动窗口操作。每个位置上，卷积核与输入进行逐元素乘积并求和，得到输出特征图上的一个值。多个卷积核并行工作可以提取不同的特征模式。卷积层的特点是参数共享和局部连接，这大大减少了参数数量。随着网络深度增加，提取的特征从低级（边缘）到高级（物体）逐渐抽象化。

import tensorflow as tf

import numpy as np

image = np.array([

    [1, 0, 1, 0],

    [0, 1, 0, 1],

    [1, 0, 1, 0],

    [0, 1, 0, 1]

]).reshape(1, 4, 4, 1).astype(np.float32)

conv_layer = tf.keras.layers.Conv2D(

    filters=1,

    kernel_size=(2, 2),

    activation='relu',

    input_shape=(4, 4, 1)

)

output = conv_layer(image)

print("原始图片形状:", image.shape)

print("原始图片:\n", image.reshape(4, 4))

print("\n卷积后的形状:", output.shape)

print("卷积后的结果:\n", output.numpy().reshape(3, 3))

原始图片形状: (1, 4, 4, 1)

原始图片:

 [[1. 0. 1. 0.]

 [0. 1. 0. 1.]

 [1. 0. 1. 0.]

 [0. 1. 0. 1.]]

卷积后的形状: (1, 3, 3, 1)

卷积后的结果:

 [[1.0863366  0.26357883 1.0863366 ]

 [0.26357883 1.0863366  0.26357883]

 [1.0863366  0.26357883 1.0863366 ]]

零填充

零填充是在输入特征图外一圈添加零值的技术。主要目的是控制卷积后的输出尺寸，防止特征图过快缩小。同时解决了边缘像素参与卷积计算次数少的问题，保证了边缘信息的充分利用。填充的大小通常由卷积核的大小决定，可以选择SAME（保持输出尺寸不变）或VALID（不填充）模式。

image = np.array([

    [1, 0, 1, 0],

    [0, 1, 0, 1],

    [1, 0, 1, 0],

    [0, 1, 0, 1]

]).reshape(1, 4, 4, 1).astype(np.float32)

conv_with_padding = tf.keras.layers.Conv2D(

    filters=1,

    kernel_size=(2, 2),

    padding='same',

    input_shape=(4, 4, 1)

)

output_padded = conv_with_padding(image)

print("使用填充后的输出形状:", output_padded.shape)

print("使用填充后的结果:\n", output_padded.numpy().reshape(4, 4))

使用填充后的输出形状: (1, 4, 4, 1)

使用填充后的结果:

 [[ 0.15471017 -0.10083783  0.15471017  0.5130797 ]

 [-0.10083783  0.15471017 -0.10083783 -0.60765034]

 [ 0.15471017 -0.10083783  0.15471017  0.5130797 ]

 [-0.6139175  -0.60765034 -0.6139175  -0.60765034]]

池化

池化层对输入特征图进行降采样操作，最常用的是最大池化。它在固定大小的窗口内选择最大值作为输出，从而减少特征图的空间维度。这种操作可以减少计算量，提供一定程度的平移不变性。池化层没有可学习参数，仅通过统计操作来压缩特征信息。

image = np.array([

    [2, 1, 3, 4],

    [5, 6, 7, 8],

    [1, 2, 3, 4],

    [5, 6, 7, 8]

]).reshape(1, 4, 4, 1).astype(np.float32)

max_pool = tf.keras.layers.MaxPooling2D(

    pool_size=(2, 2),

    strides=(2, 2)

)

pooled_output = max_pool(image)

print("原始图片:\n", image.reshape(4, 4))

print("\n池化后的形状:", pooled_output.shape)

print("池化后的结果:\n", pooled_output.numpy().reshape(2, 2))

原始图片:

 [[2. 1. 3. 4.]

 [5. 6. 7. 8.]

 [1. 2. 3. 4.]

 [5. 6. 7. 8.]]

池化后的形状: (1, 2, 2, 1)

池化后的结果:

 [[6. 8.]

 [6. 8.]]

批标准化

批标准化在每个mini-batch上对特征进行标准化处理，将分布转换为均值为0、方差为1的标准分布。通过可学习的缩放和平移参数，网络可以自适应调整标准化的程度。这种操作可以减缓内部协变量偏移，加速网络训练，并具有轻微的正则化效果。在推理时使用整个训练集的统计量进行标准化。

model = tf.keras.Sequential([

    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),

    tf.keras.layers.BatchNormalization(),

    tf.keras.layers.MaxPooling2D((2, 2))

])

sample_input = np.random.random((1, 28, 28, 1))

layer_outputs = []

for layer in model.layers:

    sample_input = layer(sample_input)

    layer_outputs.append(sample_input)

print("原始卷积层输出的统计信息:")

print("均值:", np.mean(layer_outputs[0]))

print("标准差:", np.std(layer_outputs[0]))

print("\n批标准化后的统计信息:")

print("均值:", np.mean(layer_outputs[1]))

print("标准差:", np.std(layer_outputs[1]))

print("\n最终池化层输出的统计信息:")

print("均值:", np.mean(layer_outputs[2]))

print("标准差:", np.std(layer_outputs[2]))

原始卷积层输出的统计信息:

均值: 0.103146255

标准差: 0.1461332

批标准化后的统计信息:

均值: 0.10309471

标准差: 0.14606018

最终池化层输出的统计信息:

均值: 0.16544805

标准差: 0.17202306

实例

数据处理

我们使用一个例子来学习TF搭建CNN模型进行Fashion MNIST服装图像的分类。

Fashion MNIST服装图像数据集中，每个图片都是28*28像素，单通道的70000个灰度图像，一共有十个类别。

使用以下代码可以直接从 TensorFlow 中导入和加载 Fashion MNIST 数据并查看第一张图片：

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,BatchNormalization,Activation,Dropout,Add,Input,GlobalAveragePooling2D # type: ignore

from tensorflow.keras import Model # type: ignore

fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train,y_train),(x_test,y_test) = fashion_mnist.load_data()

print(x_train.shape)

plt.imshow(x_train[0], cmap='gray')

plt.colorbar()

plt.show()

使用内置数据集的好处就是在初学时期不用考虑数据集的构建，将更多精力放到模型的构建以及代码实现上。后期的实践部分会详细讲述如何调用高级API快速使用自己的数据集如何制作与处理。内置的Fashion MNIST已经分好了训练集和测试集的比例以及对应的标签，x表示输入，y代表标签。train表示训练，test表示测试。

从终端可以看到运行结果为：

x_train.shape: (60000, 28, 28)

说明训练集一共有60000个样本，每个样本都是28*28的灰度图。

训练集的第一张图片显示如下：

在训练之前，需要把灰度图片进行归一化，由于灰度值在0到255之间，因此将其除以255.0即可归一化到0与1之间。

x_train = x_train/255.0

x_test = x_test/255.0

卷积神经网络需要的数据格式为(batch,heigh,weight,channel)，数据集为灰度图，默认只有一个通道。因此要为原始数据增添一个维度，用来表示通道数。

x_train = x_train[..., tf.newaxis]  # [60000, 28, 28] -> [60000, 28, 28, 1]

x_test = x_test[..., tf.newaxis]    # [10000, 28, 28] -> [10000, 28, 28, 1]

搭建模型

卷积就是特征提取，搭建卷积的口诀就是"CBAPD"(来源:北京大学曹建老师)：'C'表示卷积层；'B'表示批量归一化层，在卷积层和激活层之间，用于规范化数据，可以加速训练过程并提供一定的正则化效果；'A'表示激活层；'P'表示池化层;'D'表示Dropout 层，用于防止过拟合，在训练时会随机"丢弃"一些神经元（将它们的输出设为0），减少神经元之间的依赖关系。

在卷积层后面接全连接层作为模型的输出，将学到的特征图转换为最终的分类结果。

封装后的网络模型如下：

class CNN_model(Model):

    def __init__(self):

        super(CNN_model,self).__init__()

        self.conv1 = Conv2D(32,(3,3),padding='same')

        self.bn1 = BatchNormalization()

        self.relu1 = Activation('relu')

        self.pool1 = MaxPooling2D((2,2),strides=2,padding='same')

        self.conv2 = Conv2D(64,(3,3),padding='same')

        self.bn2 = BatchNormalization()

        self.relu2 = Activation('relu')

        self.pool2 = MaxPooling2D((2,2),strides=2,padding='same')

        self.flatten = Flatten()

        self.dropout1 = Dropout(0.2)

        self.f1 = Dense(128,activation='relu')

        self.dropout2 = Dropout(0.2)

        self.f2 = Dense(10,kernel_regularizer=tf.keras.regularizers.l2(0.01))

    def call(self,x):

        x = self.conv1(x)

        x = self.bn1(x)

        x = self.relu1(x)

        x = self.pool1(x)

        x = self.dropout1(x)

        x = self.conv2(x)

        x = self.bn2(x)

        x = self.relu2(x)

        x = self.pool2(x)

        x = self.flatten(x)

        x = self.dropout2(x)

        x = self.f1(x)

        x = self.f2(x)

        return x

使用model.summary()可以查看网络模型结构及参数。

model.build(input_shape=(None, 28, 28, 1))

model.summary()

Model: "cnn_model"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 conv2d (Conv2D)             multiple                  320

 batch_normalization (Batch  multiple                  128

 Normalization)

 activation (Activation)     multiple                  0

 max_pooling2d (MaxPooling2  multiple                  0

 D)

 conv2d_1 (Conv2D)           multiple                  18496

 batch_normalization_1 (Bat  multiple                  256

 chNormalization)

 activation_1 (Activation)   multiple                  0

 max_pooling2d_1 (MaxPoolin  multiple                  0

 g2D)

 flatten (Flatten)           multiple                  0

 dropout (Dropout)           multiple                  0

 dense (Dense)               multiple                  401536

 dropout_1 (Dropout)         multiple                  0

 dense_1 (Dense)             multiple                  1290

=================================================================

Total params: 422026 (1.61 MB)

Trainable params: 421834 (1.61 MB)

Non-trainable params: 192 (768.00 Byte)

_________________________________________________________________

通过模型摘要可以看到，本文设计的CNN模型总共包含422,026个参数，约1.61MB。其中421,834个是可训练参数，这些参数会在训练过程中不断更新以优化模型性能；另外还有192个不可训练参数，这些主要来自批归一化层中用于统计的均值和方差参数。

模型训练

使用TF中的keras高级api进行模型的编译。

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['SparseCategoricalAccuracy'])

在编译代码中：

优化器(optimizer)：使用 'adam' 优化器，是一种自适应学习率的优化算法，能够自动调整参数的学习速率。

损失函数(loss)：使用 SparseCategoricalCrossentropy 作为损失函数，这是多分类问题的标准损失函数。

from_logits=True ：表示模型的输出是原始的逻辑值（logits），而不是经过softmax处理的概率值。

评估指标(metrics)：使用 SparseCategoricalAccuracy 来评估模型性能，它会计算预测类别与真实类别的匹配程度。

对于不同的问题，上面参数的选择也会有所不同，对于多分类问题，就是结合SparseCategoricalCrossentropy 与SparseCategoricalAccuracy 一起使用，当然也可以添加'accuracy'参数作为评估指标，在后续章节会总结对于不同任务的不同参数选择与组合。

model.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test),batch_size=128,verbose=1,validation_freq=1)

在训练代码中：

训练数据：x_train 和 y_train 分别是训练集的输入数据和标签。

epochs=10：指定训练轮数为10轮，即整个训练数据集会被完整地处理10次。

validation_data：指定验证集数据 (x_test, y_test)，用于评估模型在未见过的数据上的表现。

batch_size=128：是一个超参数，表示每次训练使用128个样本。

verbose=1：设置训练过程中的输出信息级别，1表示显示详细的进度条。

validation_freq=1：每训练一个epoch后进行一次验证，用于监控模型是否过拟合。

完整代码

import tensorflow as tf

import numpy as np

import matplotlib.pyplot as plt

from tensorflow.keras.layers import Dense,Flatten,Conv2D,MaxPooling2D,BatchNormalization,Activation,Dropout,Add,Input,GlobalAveragePooling2D # type: ignore

from tensorflow.keras import Model # type: ignore

fashion_mnist = tf.keras.datasets.fashion_mnist

(x_train,y_train),(x_test,y_test) = fashion_mnist.load_data()

# print("x_train.shape:",x_train.shape)

# plt.imshow(x_train[0], cmap='gray')

# plt.colorbar()

# plt.show()

x_train = x_train/255.0

x_test = x_test/255.0

x_train = x_train[..., tf.newaxis]  # [60000, 28, 28] -> [60000, 28, 28, 1]

x_test = x_test[..., tf.newaxis]    # [10000, 28, 28] -> [10000, 28, 28, 1]

#   CBAPD搭建

class CNN_model(Model):

    def __init__(self):

        super(CNN_model,self).__init__()

        self.conv1 = Conv2D(filters=32,kernel_size=(3,3),padding='same')

        self.bn1 = BatchNormalization()

        self.relu1 = Activation('relu')

        self.pool1 = MaxPooling2D((2,2),strides=2,padding='same')

        self.conv2 = Conv2D(64,(3,3),padding='same')

        self.bn2 = BatchNormalization()

        self.relu2 = Activation('relu')

        self.pool2 = MaxPooling2D((2,2),strides=2,padding='same')

        self.flatten = Flatten()

        self.dropout1 = Dropout(0.2)

        self.f1 = Dense(128,activation='relu')

        self.dropout2 = Dropout(0.2)

        self.f2 = Dense(10,kernel_regularizer=tf.keras.regularizers.l2(0.01))

    def call(self,x):

        x = self.conv1(x)

        x = self.bn1(x)

        x = self.relu1(x)

        x = self.pool1(x)

        x = self.dropout1(x)

        x = self.conv2(x)

        x = self.bn2(x)

        x = self.relu2(x)

        x = self.pool2(x)

        x = self.flatten(x)

        x = self.dropout2(x)

        x = self.f1(x)

        x = self.f2(x)

        return x

model = CNN_model()

model.build(input_shape=(None, 28, 28, 1))

model.summary()

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['SparseCategoricalAccuracy'])

model.fit(x_train,y_train,epochs=10,validation_data=(x_test,y_test),batch_size=128,verbose=1,validation_freq=1)