Ha, it's English time, let's spend a few minutes to learn a simple machine learning example in a simple passage.

Introduction

What is machine learning? you design methods for machine to learn itself and improve itself.
By leading into the machine learning methods, this passage introduced three methods to get optimal k and b of linear regression(y = k*x + b).
The data used is produced by ourselves.

Self-sufficient data generation
Random Chosen Method
Supervised Direction Method
Gradient Descent Method
Conclusion

Self-sufficientDataGeneration

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np

import random

#produce data

age_with_fares = pd.DataFrame({"Fare":[263.0, 247.5208, 146.5208, 153.4625, 135.6333, 247.5208, 164.8667, 134.5, 135.6333, 153.4625, 134.5, 263.0, 211.5, 263.0, 151.55, 153.4625, 227.525, 211.3375, 211.3375],

                          "Age":[23.0, 24.0, 58.0, 58.0, 35.0, 50.0, 31.0, 40.0, 36.0, 38.0, 41.0, 24.0, 27.0, 64.0, 25.0, 40.0, 38.0, 29.0, 43.0]})

sub_fare = age_with_fares['Fare']

sub_age = age_with_fares['Age']

#show our data

plt.scatter(sub_age,sub_fare)

plt.show()

def func(age, k, b): return k*age+b

def loss(y,yhat): return np.mean(np.abs(y-yhat))

#here we choose only minus methods as the loss, besides, there are mean-square-error(L2) loss and other loss methods

RandomChosenMethod

min_error_rate = float('inf')

loop_times = 10000

losses = []

def step(): return random.random() * 2 - 1

# random生成 0~1的随机数；(0,1)*2 -> (0,2); 再减1 -> (-1,1)， 随机生成+循环：学习动力来源

while loop_times > 0:

    k_hat = random.random() * 20 - 10

    b_hat = random.random() * 20 - 10

    estimated_fares = func(sub_age, k_hat, b_hat)

    error_rate = loss(y=sub_fare, yhat=estimated_fares)

    if error_rate<min_error_rate:# 自我监督机制体现在此

        min_error_rate = error_rate

        losses.append(error_rate)

        best_k = k_hat

        best_b = b_hat

    loop_times -= 1

plt.scatter(sub_age, sub_fare)

plt.plot(sub_age, func(sub_age, best_k, best_b), c = 'r')

plt.show()

show the loss change

plt.plot(range(len(losses)), losses)

plt.show()

Explain

We can see the loss decrease sometimes quickly, sometimes slowly, anyway, it decreases finally.
One shortcoming of this method: the Random Chosen methods is not so valid as it runs random function tons of time.
Because even when it comes out a better parameter, it may choose a worse one next time.
One improved method see next part.

SupervisedDirectionMethod

change_directions = [

    (+1, -1),# k increase, b decrease

    (+1, +1),

    (-1, -1),

    (-1, +1)

]

min_error_rate = float('inf')

loop_times = 10000

losses = []

best_direction = random.choice(change_directions)

#定义每次变化（步长）的大小

def step(): return random.random()*2-1

#random生成 0~1的随机数；(0,1)*2 -> (0,2); 再减1 -> (-1,1);

#但是change_directions已经有加减1（改变方向）的操作，所以去掉 *2-1

#但保留*2-1 能增加choise

k_hat = random.random() * 20 - 10

b_hat = random.random() * 20 - 10

best_k, best_b = k_hat, b_hat

while loop_times > 0:

    k_delta_direction, b_delta_direction = best_direction or random.choice(change_directions)

    k_delta = k_delta_direction * step()

    b_delta = b_delta_direction * step()

    new_k = best_k + k_delta

    new_b = best_b + b_delta

    estimated_fares = func(sub_age, new_k, new_b)

    error_rate = loss(y=sub_fare, yhat=estimated_fares)

    #print(error_rate)

    if error_rate < min_error_rate:#supervisor learning

        min_error_rate = error_rate

        best_k, best_b = new_k, new_b

        best_direction = (k_delta_direction, b_delta_direction)

        #print(min_error_rate)

        #print("loop == {}".format(loop_times))

        losses.append(min_error_rate)

        #print("f(age) = {} * age + {}, with error rate: {}".format(best_k, best_b, error_rate))

    else:

        best_irection = random.choice(list(set(change_directions)-{(k_delta_direction, b_delta_direction)}))

        #新方向不能等于老方向

    loop_times -= 1

print("f(age) = {} * age + {}, with error rate: {}".format(best_k, best_b, error_rate))

plt.scatter(sub_age, sub_fare)

plt.plot(sub_age, func(sub_age, best_k, best_b), c = 'r')

plt.show()

show the loss change

plt.plot(range(len(losses)), losses)

plt.show()

Explain

The Supervised Direction method(2nd method) is better than Random Chosen method(1st method).
The 2nd method introduced supervise mechanism, which is more efficiently in changing parameters k and b.
But the 2nd method can't optimize the parameters to smaller magnitude.
Besides, the 2nd method can't find the extreme value, thus can't find the optimal parameters effectively.

GradientDescentMethod

min_error_rate = float('inf')

loop_times = 10000

losses = []

learing_rate = 1e-1

change_directions = [

    # (k, b)

    (+1, -1), # k increase, b decrease

    (+1, +1),

    (-1, +1),

    (-1, -1)  # k decrease, b decrease

]

k_hat = random.random() * 20 - 10

b_hat = random.random() * 20 - 10

best_direction = None

def step(): return random.random() * 1

direction = random.choice(change_directions)

def derivate_k(y, yhat, x):

    abs_values = [1 if (y_i - yhat_i) > 0 else -1 for y_i, yhat_i in zip(y, yhat)]

    return np.mean([a * -x_i for a, x_i in zip(abs_values, x)])

def derivate_b(y, yhat):

    abs_values = [1 if (y_i - yhat_i) > 0 else -1 for y_i, yhat_i in zip(y, yhat)]

    return np.mean([a * -1 for a in abs_values])

while loop_times > 0:

    k_delta = -1 * learing_rate * derivate_k(sub_fare, func(sub_age, k_hat, b_hat), sub_age)

    b_delta = -1 * learing_rate * derivate_b(sub_fare, func(sub_age, k_hat, b_hat))

    k_hat += k_delta

    b_hat += b_delta

    estimated_fares = func(sub_age, k_hat, b_hat)

    error_rate = loss(y=sub_fare, yhat=estimated_fares)

    #print('loop == {}'.format(loop_times))

    #print('f(age) = {} * age  {}, with error rate: {}'.format(k_hat, b_hat, error_rate))

    losses.append(error_rate)

    loop_times -= 1

print('f(age) = {} * age  {}, with error rate: {}'.format(k_hat, b_hat, error_rate))

plt.scatter(sub_age, sub_fare)

plt.plot(sub_age, func(sub_age, k_hat, b_hat), c = 'r')

plt.show()

show the loss change

plt.plot(range(len(losses)), losses)

plt.show()

Explain

To fit the objective function given discrete data, we use the loss function to determine how good the fit is.
In order to get the minimum loss, it becomes a problem of finding the extremum without constraints.
Therefore, the method of gradient reduction of the objective function is conceived.
The gradient is the maximum value in the directional derivative.
When the gradient approaches 0, we fit the better objective function.

Conclusion

Machine learning is a process to make the machine learning and improving by methods designed by us.
Random function usually not so efficient, but when we add supervise mechanism, it becomes efficient.
Gradient Descent is efficiently to find extreme value and optimal.

Serious question for this article:

Why do you use machine learning methods instead of creating a y = k*x + b formula?

In some senarios, complicated formula can't meet the reality needs, like irrational elements in economics models.
When we have enough valid data, we can run regression or classification model by machine learning methods
We can also evaluate our machine learning model by test data which contributes to the application of the model in our real life
This is just an example, Okay.

Reference for this article: Jupyter Notebook

Linear Regression with machine learning methods的更多相关文章

Machine Learning Methods: Decision trees and forests
Machine Learning Methods: Decision trees and forests This post contains our crib notes on the basics ...
How to use data analysis for machine learning (example, part 1)
In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite ...
机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)
##机器学习(Machine Learning)&深度学习(Deep Learning)资料(Chapter 2)---#####注:机器学习资料[篇目一](https://github.co ...
How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
booklist for machine learning
Recommended Books Here is a list of books which I have read and feel it is worth recommending to fri ...
Machine Learning and Data Mining（机器学习与数据挖掘）
Problems[show] Classification Clustering Regression Anomaly detection Association rules Reinforcemen ...
Why The Golden Age Of Machine Learning is Just Beginning
Why The Golden Age Of Machine Learning is Just Beginning Even though the buzz around neural networks ...
Introduction to Machine Learning
Chapter 1 Introduction 1.1 What Is Machine Learning? To solve a problem on a computer, we need an al ...
Machine learning | 机器学习中的范数正则化
目录 1. $l_0$范数和$l_1$范数 2. $l_2$范数 3. 核范数(nuclear norm) 参考文献使用正则化有两大目标: 抑制过拟合: 将先验知识融入学习过程,比如稀疏 ...

随机推荐

python学习笔记5-字典
# 字典(哈希映射.关联数组) d0 = {'a': 2, [0,1]:[1,2,3]} # TypeError: unhashable type: 'list' # 值可以是任意数据类型,但键不能是 ...
【Bad Practice】12306 query
【Eclipse】-NO.163.Eclipse.1 -【Eclipse springboot 1.x 创建maven工程初始化报错】
Style:Mac Series:Java Since:2018-09-10 End:2018-09-10 Total Hours:1 Degree Of Diffculty:5 Degree Of ...
Java-接口(interface)
1.1接口的定义 java中接口是一系列方法的声明,是一些方法特征的集合,一个接口只有方法的特征没有方法的实现,因此这些方法可以在不同的地方被不同的类实现,而这些实现可以具有不同的行为(功能). 接口 ...
visual studio 中被遗忘的任务列表和书签
任务列表(Task List)是VS中被人遗忘的一个功能,用到跳转到不同的代码段非常不便.以后就不用每次前进和后退导航了. 使用“任务列表” 跟踪使用 TODO 和 HACK或自定义令牌等令牌的代码注 ...
SQL Server get SP parameters and get output fields type information
Summary 本文主要介绍一下,SQL里面的两个很实用的两个操作: 获取存储过程的参数信息 SELECT * FROM INFORMATION_SCHEMA.PARAMETERS WHERE SPE ...
解决IE浏览器把application/json响应视为文件并尝试下载
下面我的解决方案是针对.net MVC的,其他的解决方案也类似,就是把响应的mimeType换成IE浏览器已经拥有的.如application/json换成text/plain #region 退出登 ...
《linux就该这么学》第十七节课：第18，19，23章，mariadb数据库、PXE无人值守安装系统和openldap目录服务。
第23章 (借鉴请改动) openldap数据的特点:1.短小.2.读取次数较多上述说明: openLDAP服务端配置: 1.yum install -y openldap openldap ...
vue页面传参
1.传的参数是数组传递参数的页面 let setStr = encodeURIComponent(JSON.stringify(this.tableData)); this.$router.push ...
nginx----------前端写了一套带有vue路由的的功能。放到nginx配置的目录下以后，刷新会报404未找到。
1. 这是根据实际情况来写的. location /h5/activity/wechat/ { index index.html index.htm index.php; ...

Linear Regression with machine learning methods

Introduction

Self-sufficientDataGeneration

RandomChosenMethod

show the loss change

Explain

SupervisedDirectionMethod

show the loss change

Explain

GradientDescentMethod

show the loss change

Explain

Conclusion

Linear Regression with machine learning methods的更多相关文章

随机推荐

热门专题