深度学习基础-基于Numpy的感知机Perception构建和训练

1. 感知机模型

感知机Perception是一个线性的分类器，其只适用于线性可分的数据。
f(x) = sign(w.x + b)

其试图在所有线性可分超平面构成的假设空间中找到一个能使训练集中的数据可分的超平面。
因此，它找到的并不一定是最优的，即只是恰好拟合了训练数据的超平面。

2. 学习

感知机的学习策略为：最小化误分类点到超平面的距离。

3. 基于numpy的感知机实现

 1 # coding: utf-8

 2 import numpy as np

 3

 4

 5 def prepare_data(n=100):

 6     # Fitting OR gate

 7     def OR(x):

 8         w = np.array([0.5, 0.5])

 9         b = -0.2

10         tmp = np.sum(w*x) + b

11         if tmp <= 0:

12             return 0

13         else:

14             return 1

15

16     inputs = np.random.randn(n, input_size)

17     labels = np.array([OR(inputs[i]) for i in range(n)])

18     return inputs, labels

19

20

21 class Perception:

22     def __init__(self, input_size, lr=0.001):

23         # 初始化权重和偏置

24         self.w = np.random.randn(input_size)

25         self.b = np.random.randn(1)

26         self.lr = np.array(lr)

27

28     def predict(self, x):

29         tmp = np.sum(self.w*x) + self.b

30         if tmp <= 0:

31             return -1

32         else:

33             return 1

34

35     def update(self, x, y):

36         # 基于SGD的参数更新（由最小化误分类点到超平面的距离求导可得）

37         self.w = self.w + self.lr*y*x

38         self.b = self.b + self.lr*y

39

40

41 n = 1000     # 训练样本数

42 ratio = 0.8  # 训练测试比

43 input_size = 2

44

45 print("Preparing Data {}".format(n))

46 X, Y = prepare_data(n)

47 clip_num = int(n * ratio)

48 train_X, train_Y = X[:clip_num], Y[:clip_num]

49 test_X, test_Y = X[clip_num:], Y[clip_num:]

50

51 # Init model

52 lr = 0.005

53 model = Perception(input_size, lr)

54 s = model.predict(X[0])

55 print("Input: ({}, {}), Output: {}".format(X[0][0], X[0][1], s))

56

57 # Training

58 epoches = 100

59 for i in range(epoches):

60     loss = 0

61     wrong_index = []

62     print("\nEpoch {}".format(i+1))

63     print("Forward Computing")

64     for idx in range(clip_num):

65         pred_y = model.predict(train_X[idx])

66         if pred_y != train_Y[idx]:

67             wrong_index.append(idx)

68             tmp_loss = abs(float(np.sum(model.w*train_X[idx]) + model.b))

69             loss += tmp_loss

70

71     print("Wrong predict samples: {}, Loss: {}".format(len(wrong_index), loss))

72     print("Learning")

73     for j in wrong_index:

74         model.update(train_X[j], train_Y[j])

75

76

77 # Testing

78 wrong_num = 0

79 test_loss = 0

80 for j in range(test_X.shape[0]):

81     pred_y = model.predict(test_X[j])

82     if pred_y != test_Y[j]:

83         tmp_loss = abs(float(np.sum(model.w*test_X[j]) + model.b))

84         test_loss += tmp_loss

85         wrong_num += 1

86 print("\nTest wrong predict samples: {}, Loss: {}".format(wrong_num , test_loss))

4. 感知机的延伸

感知机Perception是线性模型，它不能学习非线性函数，因而它对线性不可分的数据束手无力。

例如，感知机可以拟合与门（AND）、或门（OR）、非门（NOT）产生的数据，但是不能处理好异或门（XOR）产生的数据。

基于感知机，可以延伸出LR、 SVM。此外，值得注意的是，虽然单个感知机的表达能力有限，但是如果将多个感知机叠加起来，则可以具备足够

强的表达能力，即 Multi-layer Perception（MLP)的通用近似定理（给定足够多的数据和足够宽的两层MLP，可以近似任意连续函数）。

在《深度学习入门：基于Python的理论与实现》书中有一个直观的例子。假设用三个Perception分别拟合与门、非门和或门，再基于数字电路的知识将这三个门组合起来，即可以构成异或门。