1. Variable definitions

m : training examples' count

\(X\) : design matrix. each row of \(X\) is a training example, each column of \(X\) is a feature

\[X =
\begin{pmatrix}
1 & x^{(1)}_1 & ... & x^{(1)}_n \\
1 & x^{(2)}_1 & ... & x^{(2)}_n \\
... & ... & ... & ... \\
1 & x^{(n)}_1 & ... & x^{(n)}_n \\
\end{pmatrix}\]

\[\theta =
\begin{pmatrix}
\theta_0 \\
\theta_1 \\
... \\
\theta_n \\
\end{pmatrix}\]

2. Hypothesis

\[x=
\begin{pmatrix}
x_0 \\
x_1 \\
... \\
x_n \\
\end{pmatrix}
\]

\[h_\theta(x) = g(\theta^T x) = g(x_0\theta_0 + x_1\theta_1 + ... + x_n\theta_n) = \frac{1}{1 + e^{(-\theta^Tx)}},
\]

sigmoid function

\[g(z) = \frac{1}{1 + e^{-z}},
\]

g = 1 ./ (1 + e .^ (-z));

3. Cost function

\[J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))],
\]

vectorization edition of Octave

J = -(1 / m) * sum(y' * log(sigmoid(X * theta)) + (1 - y)' * log(1 - sigmoid(X * theta)));

4. Goal

find \(\theta\) to minimize \(J(\theta)\), \(\theta\) is a vector here

4.1 Gradient descent

\[\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j,
\]

repeat until convergence{

     \(\theta_j := \theta_j - \alpha \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x^{(i)}_j\)

}

vectorization

\(S\)

\[=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & ... & h_\theta(x^{(n)}-y^{(n)})
\end{pmatrix}
\begin{pmatrix}
x^{(1)}_0 & x^{(1)}_1 & ... & x^{(1)}_3 \\
x^{(2)}_0 & x^{(2)}_1 & ... & x^{(2)}_3 \\
... & ... & ... & ... \\
x^{(n)}_0 & x^{(n)}_1 & ... & x^{(n)}_3 \\
\end{pmatrix}
\]

\[=
\begin{pmatrix}
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0 &
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_1 &
... &
\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_n
\end{pmatrix}
\]

\[\theta = \theta - S^T
\]

\[h_\theta(X) = g(X\theta) = \frac{1}{1 + e^{(-X\theta)}}
\]

\(X\theta\) is nx1, \(y\) is nx1

\(\frac{1}{1+e^{(-X\theta)}} - y\) is nx1

\[\frac{1}{1 + e^{(-X\theta)}} - y=
\begin{pmatrix}
h_\theta(x^{(1)})-y^{(1)} & h_\theta(x^{(2)})-y^{(2)} & ... & h_\theta(x^{(n)})-y^{(n)}
\end{pmatrix}
\]

\[\theta = \theta - \alpha(\frac{1}{1 + e^{(-X\theta)}} - y)X
\]

5. Regularized logistic regression

to avoid overfitting or underfitting

Cost function

\[J(\theta) = \frac{1}{m}\sum_{i=1}^m[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1 - h_\theta(x^{(i)}))] + \frac{\lambda}{2m} \sum_{j=1}^m \theta^2_j,
\]

Gradient descent

\[\frac{\partial J(\theta)}{\partial \theta_0} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_0,
\]

\[\frac{\partial J(\theta)}{\partial \theta_j} = \frac{1}{m} \sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x^{(i)}_j, (j \ge 1)
\]

[Machine learning] Logistic regression的更多相关文章

  1. 机器学习---逻辑回归(二)(Machine Learning Logistic Regression II)

    在<机器学习---逻辑回归(一)(Machine Learning Logistic Regression I)>一文中,我们讨论了如何用逻辑回归解决二分类问题以及逻辑回归算法的本质.现在 ...

  2. 机器学习---逻辑回归(一)(Machine Learning Logistic Regression I)

    逻辑回归(Logistic Regression)是一种经典的线性分类算法.逻辑回归虽然叫回归,但是其模型是用来分类的. 让我们先从最简单的二分类问题开始.给定特征向量x=([x1,x2,...,xn ...

  3. Machine Learning—Linear Regression

    Evernote的同步分享:Machine Learning-Linear Regression 版权声明:本文博客原创文章.博客,未经同意,不得转载.

  4. 机器学习---三种线性算法的比较(线性回归,感知机,逻辑回归)(Machine Learning Linear Regression Perceptron Logistic Regression Comparison)

    最小二乘线性回归,感知机,逻辑回归的比较:   最小二乘线性回归 Least Squares Linear Regression 感知机 Perceptron 二分类逻辑回归 Binary Logis ...

  5. [Machine Learning] logistic函数和softmax函数

    简单总结一下机器学习最常见的两个函数,一个是logistic函数,另一个是softmax函数,若有不足之处,希望大家可以帮忙指正.本文首先分别介绍logistic函数和softmax函数的定义和应用, ...

  6. 机器学习---线性回归(Machine Learning Linear Regression)

    线性回归是机器学习中最基础的模型,掌握了线性回归模型,有利于以后更容易地理解其它复杂的模型. 线性回归看似简单,但是其中包含了线性代数,微积分,概率等诸多方面的知识.让我们先从最简单的形式开始. 一元 ...

  7. [Machine Learning] Linear regression

    1. Variable definitions m : training examples' count \(y\) : \(X\) : design matrix. each row of \(X\ ...

  8. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  9. 机器学习---最小二乘线性回归模型的5个基本假设(Machine Learning Least Squares Linear Regression Assumptions)

    在之前的文章<机器学习---线性回归(Machine Learning Linear Regression)>中说到,使用最小二乘回归模型需要满足一些假设条件.但是这些假设条件却往往是人们 ...

随机推荐

  1. 提示:unresolved import: PIL

    解决方法: 1.打开Window>Preferences>PyDev>Interpreters>Python Interpreter>Forced Builtins,点击 ...

  2. python中的全局变量

    1. 在函数中定义的局部变量如果和全局变量同名,则会使用局部变量(即隐藏全局变量). 示例: x = 1 def func(): x = 2 print x func() print x 运行结果: ...

  3. P1251 餐巾计划问题 费用流

    https://www.luogu.org/problemnew/show/P1251 题意 有一家酒店,酒店每天需要ri张桌布,桌布可以现买,p元.可以通过快洗店,等m天,f元.可以通过慢洗店,等n ...

  4. CF EDU - E. Lomsat gelral 树上启发式合并

    学习:http://codeforces.com/blog/entry/44351 E. Lomsat gelral 题意: 给定一个以1为根节点的树,每个节点都有一个颜色,问每个节点的子树中,颜色最 ...

  5. POJ 2491 Scavenger Hunt map

    Scavenger Hunt Time Limit: 1000MS   Memory Limit: 65536K Total Submissions: 2848   Accepted: 1553 De ...

  6. Elasticsearch(5) --- Query查询和Filter查询

    Elasticsearch(5) --- Query查询和Filter查询 这篇博客主要分为 :Query查询和Filter查询.有关复合查询.聚合查询也会单独写篇博客. 一.概念 1.概念 一个查询 ...

  7. 【Offer】[8] 【中序遍历的下一个结点】

    题目描述 思路分析 Java代码 代码链接 题目描述 给定一棵二叉树和其中的一个结点,如何找出中序遍历顺序的下一个结点? 树中的结点除了有两个分别指向左右子结点的指针以外,还有一个指向父结点的指针. ...

  8. if __name__ = "main" 解释

  9. hbase读取快照数据-lzo压缩遇到的问题

    1.读取hbase快照数据时报UnsatisfiedLinkError: no gplcompression in java.library.path错: 2019-09-04 17:36:07,44 ...

  10. zookeeper的未授权访问漏洞解决

    zookeeper的基本情况 zookeeper是分布式协同管理工具,常用来管理系统配置信息,提供分布式协同服务.zookeeper官网下载软件包,bin目录下有客户端脚本和服务端脚本.另外还有个工具 ...