李宏毅 Gradient Descent Demo 代码讲解

何为梯度下降，直白点就是，链式求导法则，不断更新变量值。

这里讲解的代码为李宏毅老师机器学习课程中 class 4 回归展示 中的代码demo

Loss函数

python代码如下

import numpy as np

import matplotlib.pyplot as plt  

# y_data = b + w * x_data

x_data = [338., 333., 328., 207., 226., 25., 179., 60., 208., 606.]   # 10 个数

y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]  # 10 个数

x = np.arange(-200, -100, 1)     # bias

y = np.arange(-5, 5, 0.1)        # weight

z = np.zeros((len(x), len(y)))   # zeros函数表示输出的数组为 100行 100列  

#X, Y = np.meshgrid(x, y)  个人感觉这句话没用。。。

for i in range(len(x)):

    for j in range(len(y)):

        b = x[i]

        w = y[j]

        z[j][i] = 0

        for n in range(len(x_data)):

            # z[j][i]为 b=x[i] 及 w=y[j] 时，对应的 Loss Function 的大小

            z[j][i] = z[j][i] + (y_data[n] - b - w * x_data[n]) ** 2

        z[j][i] = z[j][i] / len(x_data)    # 求 loss function 均值 

# y_data = b + w * x_data

b = -120              # initial b

w = -4                # initial w

lr = 0.0000001          # learning rate

iteration = 100000    # 迭代运行次数  

# store initial values for plotting

b_history = [b]

w_history = [w]  

# iterations

for i in range(iteration):      # 在 100000 次迭代下，看最后结果

    b_grad = 0.0                # 对 b_grad 重新赋值为0

    w_grad = 0.0                # 对 w_grad 重新赋值为0

    for n in range(len(x_data)):

        # 此处应该注意的是，求导的是Loss函数，因此对应的变量是w、b，是看w、b在各自的轴上的移动

        b_grad = b_grad + 2.0 * (y_data[n] - b - w * x_data[n]) * ( - 1.0)

        w_grad = w_grad + 2.0 * (y_data[n] - b - w * x_data[n]) * ( - x_data[n])  

    # update parameters

    b = b - lr * b_grad

    w = w - lr * w_grad    

    # store parameters for plotting

    b_history.append(b)

    w_history.append(w)  

# plot the figure

plt.contourf(x, y, z, 50, alpha = 0.5, cmap = plt.get_cmap('jet'))

plt.plot([-188.4], [2.67], 'x', ms = 12, markeredgewidth = 3, color = 'orange')

plt.plot(b_history, w_history, 'o-', ms = 3, lw = 1.5, color = 'black')

plt.xlim(-200, -100)

plt.ylim(-5, 5)

plt.xlabel(r'$b$', fontsize=16)

plt.ylabel(r'$w$', fontsize=16)

plt.show()

当learning rate 即 lr = 0.0000001时

当learning rate 即 lr = 0.000001时

learning rate 即 lr = 0.00001时

可以看到效果不是很好所以改变learning rate

import numpy as np

import matplotlib.pyplot as plt

# y_data = b + w * x_data

x_data = [338., 333., 328., 207., 226., 25., 179., 60., 208., 606.]  # 10 个数

y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]  # 10 个数

x = np.arange(-200, -100, 1)  # bias

y = np.arange(-5, 5, 0.1)  # weight

z = np.zeros((len(x), len(y)))  # zeros函数表示输出的数组为 100 行 100 列

# X, Y = np.meshgrid(x, y)

for i in range(len(x)):

    for j in range(len(y)):

        b = x[i]

        w = y[j]

        z[j][i] = 0

        for n in range(len(x_data)):

            # z[j][i]为 b=x[i] 及 w=y[j] 时，对应的 Loss Function 的大小

            z[j][i] = z[j][i] + (y_data[n] - b - w * x_data[n]) ** 2

        z[j][i] = z[j][i] / len(x_data)  # 求 loss function 均值

# ydata = b + w * xdata

b = -120  # initial b

w = -4  # initial w

lr = 1  # learning rate

iteration = 100000  # 迭代运行次数

# store initial values for plotting

b_history = [b]

w_history = [w]

# 个性化 w 和 b 的 learning rate

lr_b = 0

lr_w = 0

# iterations

for i in range(iteration):  # 在 100000 次迭代下，看最后结果

    b_grad = 0.0  # 对 b_grad 重新赋值为0

    w_grad = 0.0  # 对 w_grad 重新赋值为0

    for n in range(len(x_data)):

        # 此处应该注意的是，求导的是L函数，因此对应的变量是w、b，是看w、b在各自的轴上的移动

        b_grad = b_grad + 2.0 * (y_data[n] - b - w * x_data[n]) * (- 1.0)

        w_grad = w_grad + 2.0 * (y_data[n] - b - w * x_data[n]) * (- x_data[n])

    lr_b = lr_b + b_grad ** 2

    lr_w = lr_w + w_grad ** 2

    # update parameters

    b = b - lr / np.sqrt(lr_b) * b_grad

    w = w - lr / np.sqrt(lr_w) * w_grad

    # store parameters for plotting

    b_history.append(b)

    w_history.append(w)

# plot the figure

plt.contourf(x, y, z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))

plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')

plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')

plt.xlim(-200, -100)

plt.ylim(-5, 5)

plt.xlabel(r'$b$', fontsize=16)

plt.ylabel(r'$w$', fontsize=16)

plt.show()

结果展示

一些说明：

np.array np.asarray的区别

array和asarry都可以将结构数据转换为ndarray类型

但是主要的区别在于当数据源是ndarray时，array仍会copy出一个副本，占用新的内存，但asarray不会。

np.meshgrid的作用

生成网格点坐标矩阵

李宏毅 Gradient Descent Demo 代码讲解的更多相关文章

几种梯度下降方法对比（Batch gradient descent、Mini-batch gradient descent 和 stochastic gradient descent）
https://blog.csdn.net/u012328159/article/details/80252012 我们在训练神经网络模型时,最常用的就是梯度下降,这篇博客主要介绍下几种梯度下降的变种 ...
李宏毅机器学习笔记2：Gradient Descent(附带详细的原理推导过程）
李宏毅老师的机器学习课程和吴恩达老师的机器学习课程都是都是ML和DL非常好的入门资料,在YouTube.网易云课堂.B站都能观看到相应的课程视频,接下来这一系列的博客我都将记录老师上课的笔记以及自己对 ...
李宏毅机器学习课程---4、Gradient Descent （如何优化）
李宏毅机器学习课程---4.Gradient Descent (如何优化) 一.总结一句话总结: 调整learning rates:Tuning your learning rates 随机Grad ...
Logistic Regression Using Gradient Descent -- Binary Classification 代码实现
1. 原理 Cost function Theta 2. Python # -*- coding:utf8 -*- import numpy as np import matplotlib.pyplo ...
Linear Regression Using Gradient Descent 代码实现
参考吴恩达<机器学习>, 进行 Octave, Python(Numpy), C++(Eigen) 的原理实现, 同时用 scikit-learn, TensorFlow, dlib 进行 ...
【笔记】机器学习 - 李宏毅 - 4 - Gradient Descent
梯度下降 Gradient Descent 梯度下降是一种迭代法(与最小二乘法不同),目标是解决最优化问题:\({\theta}^* = arg min_{\theta} L({\theta})\), ...
【论文翻译】An overiview of gradient descent optimization algorithms
这篇论文最早是一篇2016年1月16日发表在Sebastian Ruder的博客.本文主要工作是对这篇论文与李宏毅课程相关的核心部分进行翻译. 论文全文翻译: An overview of gradi ...
梯度下降算法实现原理(Gradient Descent)
概述梯度下降法(Gradient Descent)是一个算法,但不是像多元线性回归那样是一个具体做回归任务的算法,而是一个非常通用的优化算法来帮助一些机器学习算法求解出最优解的,所谓的通用就是很 ...
梯度下降（Gradient Descent）小结
在求解机器学习算法的模型参数,即无约束优化问题时,梯度下降(Gradient Descent)是最常采用的方法之一,另一种常用的方法是最小二乘法.这里就对梯度下降法做一个完整的总结. 1. 梯度在微 ...

随机推荐

在cubemx中使用freertos中的注意事项
就是使用信号量等rtos自带特性的时候,务必先初始化然后在发生信号量或接收. 而且在中断中发送信号量或队列的时候,务必把使能中断的语句放在初始化freertos之后,尤其是cubemx生成的代码,默认 ...
关于topN问题的几种解决方案
在系统中,我们经常会遇到这样的需求:将大量(比如几十万.甚至上百万)的对象进行排序,然后只需要取出最Top的前N名作为排行榜的数据,这即是一个TopN算法.常见的解决方案有三种: (1)直接使用Lis ...
ACM-ICPC 2016 大连赛区现场赛 K. Guess the number && HDU 5981（思维+DP）
题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=5981 题意:A在[L, R]之间随机选取一个数X,之后B来猜这个数,如果猜的数比X小,那么A就告诉B猜 ...
fatal: refusing to merge unrelated histories(git pull)
https://blog.csdn.net/lindexi_gd/article/details/52554159 (refusing to merge unrelated histories) ht ...
Maven手动将jar导入本地仓库
1.使用cmd进入maven安装目录下的bin 2.运行mvn install:install-file -Dfile=jar包的路径 -DgroupId=gruopId中的内容 -Dartifact ...
gcc/g++以c++11编译
方法一: //在程序头加上预定义编译器命令 #pragma GCC diagnostic error "-std=c++11" //通过#pragma 指示 GCC编译器处理错误的 ...
INNER JOIN连接两个表、三个表、五个表的SQL语句
1.连接两个数据表的用法: FROM Member INNER JOIN MemberSort ON Member.MemberSort=MemberSort.MemberSort 语法格式可以概括为 ...
IDEA2019.1.3的安装和破解
上一篇文章我有写过我会尝试安装IDEA(这玩意儿收费啊!),倘若尝试成功以后都会用它编译,很幸运,我安装成功了,所以今天这篇文章我来写安装和破解方法. IDEA界面: 首先我们访问官方网站:htt ...
MyBatis-Plus的一些问题
图一图二分别是搜索设计师的动态sql,已知这个字段是integer类型.为什么用=号查询的时候会显示查询超时.但是我把sql打印出来的结果直接去执行,时间在一秒是可以出来结果的. 但是用like一个i ...
Java中局部变量、实例变量和静态变量在方法区、栈内存、堆内存中的分配
转自:https://blog.csdn.net/leunging/article/details/80599282 感谢CSDN博主「leunging」的总结分享 ———————————————— ...

李宏毅 Gradient Descent Demo 代码讲解

李宏毅 Gradient Descent Demo 代码讲解的更多相关文章

随机推荐

热门专题