1. Variable definitions

m : training examples' count

\(X\) : design matrix. each row of \(X\) is a training example, each column of \(X\) is a feature

\[X =
\begin{pmatrix}
1 & x^{(1)}_1 & ... & x^{(1)}_n \\
1 & x^{(2)}_1 & ... & x^{(2)}_n \\
... & ... & ... & ... \\
1 & x^{(n)}_1 & ... & x^{(n)}_n \\
\end{pmatrix}\]

\[\theta =
\begin{pmatrix}
\theta_0 \\
\theta_1 \\
... \\
\theta_n \\
\end{pmatrix}\]

2. Hypothesis

\[x=
\begin{pmatrix}
x_0 \\
x_1 \\
... \\
x_n \\
\end{pmatrix}
\]

\[h_\theta(x) = g(\theta^T x) = g(x_0\theta_0 + x_1\theta_1 + ... + x_n\theta_n) = \frac{1}{1 + e^{(-\theta^Tx)}},
\]

sigmoid function

\[g(z) = \frac{1}{1 + e^{-z}},
\]

g = 1 ./ (1 + e .^ (-z));

3. Cost function

\[J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))],
\]

vectorization edition of Octave

J = -(1 / m) * sum(y' * log(sigmoid(X * theta)) + (1 - y)' * log(1 - sigmoid(X * theta)));

4. Goal

find \(\theta\) to minimize \(J(\theta)\), \(\theta\) is a vector here

4.1 Gradient descent

\[\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j,
\]

repeat until convergence{

     \(\theta_j := \theta_j - \alpha \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j\)

}

vectorization

\(S\)

\[=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & ... & h_\theta(x^{(n)}-y^{(n)})
\end{pmatrix}
\begin{pmatrix}
x^{(1)}_0 & x^{(1)}_1 & ... & x^{(1)}_3 \\
x^{(2)}_0 & x^{(2)}_1 & ... & x^{(2)}_3 \\
... & ... & ... & ... \\
x^{(n)}_0 & x^{(n)}_1 & ... & x^{(n)}_3 \\
\end{pmatrix}
\]

\[=
\begin{pmatrix}
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0 &
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_1 &
... &
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_n
\end{pmatrix}
\]

\[\theta = \theta - S^T
\]

\[h_\theta(X) = g(X\theta) = \frac{1}{1 + e^{(-X\theta)}}
\]

\(X\theta\) is nx1, \(y\) is nx1

\(\frac{1}{1+e^{(-X\theta)}} - y\) is nx1

\[\frac{1}{1 + e^{(-X\theta)}} - y=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & ... & h_\theta(x^{(n)})-y^{(n)}
\end{pmatrix}
\]

\[\theta = \theta - \alpha(\frac{1}{1 + e^{(-X\theta)}} - y)X
\]

5. Regularized logistic regression

to avoid overfitting or underfitting

Cost function

\[J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m} \sum_{j=1}^m \theta^2_j,
\]

Gradient descent

\[\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0,
\]

\[\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j, (j \ge 1)
\]

[Machine learning] Logistic regression的更多相关文章

  1. 机器学习---逻辑回归(二)(Machine Learning Logistic Regression II)

    在<机器学习---逻辑回归(一)(Machine Learning Logistic Regression I)>一文中,我们讨论了如何用逻辑回归解决二分类问题以及逻辑回归算法的本质.现在 ...

  2. 机器学习---逻辑回归(一)(Machine Learning Logistic Regression I)

    逻辑回归(Logistic Regression)是一种经典的线性分类算法.逻辑回归虽然叫回归,但是其模型是用来分类的. 让我们先从最简单的二分类问题开始.给定特征向量x=([x1,x2,...,xn ...

  3. Machine Learning—Linear Regression

    Evernote的同步分享:Machine Learning-Linear Regression 版权声明:本文博客原创文章.博客,未经同意,不得转载.

  4. 机器学习---三种线性算法的比较(线性回归,感知机,逻辑回归)(Machine Learning Linear Regression Perceptron Logistic Regression Comparison)

    最小二乘线性回归,感知机,逻辑回归的比较:   最小二乘线性回归 Least Squares Linear Regression 感知机 Perceptron 二分类逻辑回归 Binary Logis ...

  5. [Machine Learning] logistic函数和softmax函数

    简单总结一下机器学习最常见的两个函数,一个是logistic函数,另一个是softmax函数,若有不足之处,希望大家可以帮忙指正.本文首先分别介绍logistic函数和softmax函数的定义和应用, ...

  6. 机器学习---线性回归(Machine Learning Linear Regression)

    线性回归是机器学习中最基础的模型,掌握了线性回归模型,有利于以后更容易地理解其它复杂的模型. 线性回归看似简单,但是其中包含了线性代数,微积分,概率等诸多方面的知识.让我们先从最简单的形式开始. 一元 ...

  7. [Machine Learning] Linear regression

    1. Variable definitions m : training examples' count \(y\) : \(X\) : design matrix. each row of \(X\ ...

  8. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  9. 机器学习---最小二乘线性回归模型的5个基本假设(Machine Learning Least Squares Linear Regression Assumptions)

    在之前的文章<机器学习---线性回归(Machine Learning Linear Regression)>中说到,使用最小二乘回归模型需要满足一些假设条件.但是这些假设条件却往往是人们 ...

随机推荐

  1. POJ - 3660 Cow Contest 传递闭包floyed算法

    Cow Contest POJ - 3660 :http://poj.org/problem?id=3660   参考:https://www.cnblogs.com/kuangbin/p/31408 ...

  2. CodeForces 758 C Unfair Poll

    Unfair Poll 题意:一共有n排同学每排同学有m个人, 老师问问题有一个顺序, 先从第一排开始问,问完第一排的所有同学之后,再问第2排的,对于所有排的访问顺序为 1,2,3……n-1,n,n- ...

  3. codeforces 764 C. Timofey and a tree(dfs+思维)

    题目链接:http://codeforces.com/contest/764/problem/C 题意:给出一个树,然后各个节点有对应的颜色,问是否存在以一个点为根节点子树的颜色都一样. 这里的子树颜 ...

  4. linux部署html代码到linux服务器,并进行域名解析

    本博客主要是说一下,如何将本地写好的html代码部署到linux服务器,并进行解析.下一篇博客将写一下,如何将html代码部署到阿里云服务器,并进行域名解析,以及在部署过程中遇到的问题和解决方法. 1 ...

  5. Python3-编码问题-解决为何我的python打印总是出现乱码??

    #python3 编码问题: ############举个例子############################### import sys print(sys.getdefaultencodi ...

  6. Winform中实现更改DevExpress的RadioGroup的选项时更改其他控件(TextEdit、ColorPickEdit)的值

    场景 Winform中实现读取xml配置文件并动态配置ZedGraph的RadioGroup的选项: https://blog.csdn.net/BADAO_LIUMANG_QIZHI/article ...

  7. 7、创建图及图的遍历(java实现)

    1.顺序表用于图的深度优先遍历 public class SeqList { public final int MaxSize = 10; public Object list[]; public i ...

  8. 1、单链表的实现(java代码)

    1.创建链结构实体Node /** * 链表结构实体类 */ public class Node { Node next = null; //下一节点 int data; //节点数据 public ...

  9. 【学习笔记】第三章 python3核心技术与实践--Jupyter Notebook

    可能你已经知道,Python 在 14 年后的“崛起”,得益于机器学习和数学统计应用的兴起.那为什么 Python 如此适合数学统计和机器学习呢?作为“老司机”的我可以肯定地告诉你,Jupyter N ...

  10. springboot---redis缓存的使用

    1.下载redis安装包,解压到电脑 2.启动redis 3.springboot  application.properties中配置redis缓存 spring.redis.host=127.0. ...