ML | Naive Bayes
what's xxx
In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
Naive Bayes is a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate preprocessing, it is competitive in this domain with more advanced methods including support vector machines.
In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable.
An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.
Abstractly, the probability model for a classifier is a conditional model
$p(C \vert F_1,\dots,F_n)\,$
over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables $F_1$ through $F_n$. The problem is that if the number of features n is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable.
Using Bayes' theorem, this can be written
$p(C \vert F_1,\dots,F_n) = \frac{p(C) \ p(F_1,\dots,F_n\vert C)}{p(F_1,\dots,F_n)}. \,$
In plain English, using Bayesian Probability terminology, the above equation can be written as
$\mbox{posterior} = \frac{\mbox{prior} \times \mbox{likelihood}}{\mbox{evidence}}. \,$
$\begin{align}
p(C, F_1, \dots, F_n) & = p(C) \ p(F_1,\dots,F_n\vert C) \\
& = p(C) \ p(F_1\vert C) \ p(F_2,\dots,F_n\vert C, F_1) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3,\dots,F_n\vert C, F_1, F_2) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3\vert C, F_1, F_2) \ p(F_4,\dots,F_n\vert C, F_1, F_2, F_3) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ \dots p(F_n\vert C, F_1, F_2, F_3,\dots,F_{n-1})
\end{align}$
Now the "naive" conditional independence assumptions come into play: assume that each feature $F_i$ is conditionally independent of every other feature $F_j$ for $j\neq i$ given the category C. This means that
$p(F_i \vert C, F_j) = p(F_i \vert C)\,,
p(F_i \vert C, F_j,F_k) = p(F_i \vert C)\,,
p(F_i \vert C, F_j,F_k,F_l) = p(F_i \vert C)\,,$
and so on, for $i\ne j,k,l$. Thus, the joint model can be expressed as
$\begin{align}
p(C \vert F_1, \dots, F_n) & \varpropto p(C, F_1, \dots, F_n) \\
& \varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C) \ p(F_3\vert C) \ \cdots \\
& \varpropto p(C) \prod_{i=1}^n p(F_i \vert C)\,.
\end{align}$
This means that under the above independence assumptions, the conditional distribution over the class variable C is:
$p(C \vert F_1,\dots,F_n) = \frac{1}{Z} p(C) \prod_{i=1}^n p(F_i \vert C)$
where the evidence $Z = p(F_1, \dots, F_n)$ is a scaling factor dependent only on $F_1,\dots,F_n$, that is, a constant if the values of the feature variables are known.
One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function $\mathrm{classify}$ defined as follows:
$\mathrm{classify}(f_1,\dots,f_n) = \underset{c}{\operatorname{argmax}} \ p(C=c) \displaystyle\prod_{i=1}^n p(F_i=f_i\vert C=c).$
All model parameters (i.e., class priors and feature probability distributions) can be approximated with relative frequencies from the training set. These are maximum likelihood estimates of the probabilities. A class' prior may be calculated by assuming equiprobable classes (i.e., priors = 1 / (number of classes)), or by calculating an estimate for the class probability from the training set (i.e., (prior for a given class) = (number of samples in the class) / (total number of samples)). To estimate the parameters for a feature's distribution, one must assume a distribution or generate nonparametric models for the features from the training set.
Algorithm
1. 计算先验概率,class priors and feature probability distributions; $p(C)$和$Z = p(F_1, \dots, F_n)$
2. 不同特征要假设一个概率分布;$p(F_i \vert C)$;
When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution.
Another common technique for handling continuous values is to use binning to discretize the feature values, to obtain a new set of Bernoulli-distributed features.
In general, the distribution method is a better choice if there is a small amount of training data, or if the precise distribution of the data is known. The discretization method tends to do better if there is a large amount of training data because it will learn to fit the distribution of the data. Since naive Bayes is typically used when a large amount of data is available (as more computationally expensive models can generally achieve better accuracy), the discretization method is generally preferred over the distribution method.
3. 计算成为每个类的概率,取概率最大的类;
ML | Naive Bayes的更多相关文章
- [ML] Naive Bayes for Text Classification
TF-IDF Algorithm From http://www.ruanyifeng.com/blog/2013/03/tf-idf.html Chapter 1, 知道了"词频" ...
- [ML] Naive Bayes for email classification
20 Newsgroups (Original) Author: Jeffrey H 1. Introduction This is only a test report for naive baye ...
- [Scikit-learn] 1.9 Naive Bayes
Ref: http://scikit-learn.org/stable/modules/naive_bayes.html 1.9.1. Gaussian Naive Bayes 原理可参考:统计学习笔 ...
- Naive Bayes Theorem and Application - Theorem
Naive Bayes Theorm And Application - Theorem Naive Bayes model: 1. Naive Bayes model 2. model: discr ...
- 【十大算法实现之naive bayes】朴素贝叶斯算法之文本分类算法的理解与实现
关于bayes的基础知识,请参考: 基于朴素贝叶斯分类器的文本聚类算法 (上) http://www.cnblogs.com/phinecos/archive/2008/10/21/1315948.h ...
- MLLib实践Naive Bayes
引言 本文基于Spark (1.5.0) ml库提供的pipeline完整地实践一次文本分类.pipeline将串联单词分割(tokenize).单词频数统计(TF),特征向量计算(TF-IDF),朴 ...
- 基于Naive Bayes算法的文本分类
理论 什么是朴素贝叶斯算法? 朴素贝叶斯分类器是一种基于贝叶斯定理的弱分类器,所有朴素贝叶斯分类器都假定样本每个特征与其他特征都不相关.举个例子,如果一种水果其具有红,圆,直径大概3英寸等特征,该水果 ...
- 机器学习---用python实现朴素贝叶斯算法(Machine Learning Naive Bayes Algorithm Application)
在<机器学习---朴素贝叶斯分类器(Machine Learning Naive Bayes Classifier)>一文中,我们介绍了朴素贝叶斯分类器的原理.现在,让我们来实践一下. 在 ...
- [Machine Learning & Algorithm] 朴素贝叶斯算法(Naive Bayes)
生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本文介绍朴素贝叶斯分类器(Naive Bayes classifier),它是一种简单有效的常用分类算法. 一.病人分类的例子 让我从一个例子 ...
随机推荐
- python3 循环输出当前时间。
题目 暂停一秒输出(使用 time 模块的 sleep() 函数).循环输出当前时间. 代码: import time while True: time.sleep(1) print(time.str ...
- IAR调试时出现IAR one or more breakpoints could not be set and have been disabled的解决办法
问题:在IAR调试时,单步执行的时候绿色箭头一直指向汇编界面,不指向C语言界面,并且不能在C语言界面设置断点,以及在代码编辑界面,设置断点,点调试时总提示IAR one or more breakpo ...
- LA 4094 WonderTeam 构造
题意: 一共有\(n\)支队伍参加比赛,每两支队伍比赛两场,主客场各一场. 胜场得\(3\)分,平局得1分,败场不得分. 一支得分为\(p\)的队伍的排名\(=\)分数大于\(p\)的队伍数\(+1\ ...
- 简单检测CDN链接是否有效
CDN链接经常是使用的.但是,CDN链接挂了怎么办,因此,就要调用使用本站点的库,那么怎么实现呢? 检测CDN的jquery链接是否有效(这种方法比较简单) <script src=" ...
- xtrabackup: error: last checkpoint LSN (3409281307) is larger than last copied LSN (3409274368). #2
1.错误发生场景:使用2.4.1版本的xtrabackup工具进行全备,备份日志中报出此错误2.知识要点:MySQL中,redo 日志写进程会在三种条件下被触发从log buffer中写日志到redo ...
- Welcome-to-Swift-23访问控制(Access Control)
访问控制可以限定你在源文件或模块中访问代码的级别,也就是说可以控制哪些代码你可以访问,哪些代码你不能访问.这个特性可以让我们隐藏功能实现的一些细节,并且可以明确的指定我们提供给其他人的接口中哪些部分是 ...
- 【Luogu】P1486郁闷的出纳员(Splay)
题目链接 名副其实的调了一下午…… 每做一道题都是对我那不规范的Splay代码的刀刻斧凿一般的修正啊…… Splay.如果有一批员工不干了,那就找还能干的薪水最少的员工,把它splay到根,删除它的左 ...
- Dancing Links 模板
struct dl{ // x: line, y: column struct node{ int c, left, right, up, down; }; vector<node> a; ...
- <定时主库导出/备库导入>
1.设置定时任务时间及所需要的dmp文件路径 [mm1@localhost ~]$ crontab -e 0 0 * * * sh /home/mm1/exp_table.sh 2>& ...
- 关闭chrome浏览器的input香蕉黄背景
chrome浏览器input的自动完成,点击之后自动输入,input的背景会变成香蕉黄,用如下方法修复: /* Change the white to any color ;) 就是给input设置内 ...