what's xxx

In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.

Naive Bayes is a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate preprocessing, it is competitive in this domain with more advanced methods including support vector machines.

In simple terms, a naive Bayes classifier assumes that the value of a particular feature is unrelated to the presence or absence of any other feature, given the class variable. 

An advantage of naive Bayes is that it only requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix.

Abstractly, the probability model for a classifier is a conditional model

$p(C \vert F_1,\dots,F_n)\,$
over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables $F_1$ through $F_n$. The problem is that if the number of features n is large or if a feature can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable.

Using Bayes' theorem, this can be written

$p(C \vert F_1,\dots,F_n) = \frac{p(C) \ p(F_1,\dots,F_n\vert C)}{p(F_1,\dots,F_n)}. \,$
In plain English, using Bayesian Probability terminology, the above equation can be written as

$\mbox{posterior} = \frac{\mbox{prior} \times \mbox{likelihood}}{\mbox{evidence}}. \,$

$\begin{align}
p(C, F_1, \dots, F_n) & = p(C) \ p(F_1,\dots,F_n\vert C) \\
& = p(C) \ p(F_1\vert C) \ p(F_2,\dots,F_n\vert C, F_1) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3,\dots,F_n\vert C, F_1, F_2) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ p(F_3\vert C, F_1, F_2) \ p(F_4,\dots,F_n\vert C, F_1, F_2, F_3) \\
& = p(C) \ p(F_1\vert C) \ p(F_2\vert C, F_1) \ \dots p(F_n\vert C, F_1, F_2, F_3,\dots,F_{n-1})
\end{align}$

Now the "naive" conditional independence assumptions come into play: assume that each feature $F_i$ is conditionally independent of every other feature $F_j$ for $j\neq i$ given the category C. This means that

$p(F_i \vert C, F_j) = p(F_i \vert C)\,,
p(F_i \vert C, F_j,F_k) = p(F_i \vert C)\,,
p(F_i \vert C, F_j,F_k,F_l) = p(F_i \vert C)\,,$
and so on, for $i\ne j,k,l$. Thus, the joint model can be expressed as

$\begin{align}
p(C \vert F_1, \dots, F_n) & \varpropto p(C, F_1, \dots, F_n) \\
& \varpropto p(C) \ p(F_1\vert C) \ p(F_2\vert C) \ p(F_3\vert C) \ \cdots \\
& \varpropto p(C) \prod_{i=1}^n p(F_i \vert C)\,.
\end{align}$
This means that under the above independence assumptions, the conditional distribution over the class variable C is:

$p(C \vert F_1,\dots,F_n) = \frac{1}{Z} p(C) \prod_{i=1}^n p(F_i \vert C)$
where the evidence $Z = p(F_1, \dots, F_n)$ is a scaling factor dependent only on $F_1,\dots,F_n$, that is, a constant if the values of the feature variables are known.

One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function $\mathrm{classify}$ defined as follows:

$\mathrm{classify}(f_1,\dots,f_n) = \underset{c}{\operatorname{argmax}} \ p(C=c) \displaystyle\prod_{i=1}^n p(F_i=f_i\vert C=c).$

All model parameters (i.e., class priors and feature probability distributions) can be approximated with relative frequencies from the training set. These are maximum likelihood estimates of the probabilities. A class' prior may be calculated by assuming equiprobable classes (i.e., priors = 1 / (number of classes)), or by calculating an estimate for the class probability from the training set (i.e., (prior for a given class) = (number of samples in the class) / (total number of samples)). To estimate the parameters for a feature's distribution, one must assume a distribution or generate nonparametric models for the features from the training set.

Algorithm

1. 计算先验概率,class priors and feature probability distributions; $p(C)$和$Z = p(F_1, \dots, F_n)$

2. 不同特征要假设一个概率分布;$p(F_i \vert C)$;

When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution.

Another common technique for handling continuous values is to use binning to discretize the feature values, to obtain a new set of Bernoulli-distributed features.

In general, the distribution method is a better choice if there is a small amount of training data, or if the precise distribution of the data is known. The discretization method tends to do better if there is a large amount of training data because it will learn to fit the distribution of the data. Since naive Bayes is typically used when a large amount of data is available (as more computationally expensive models can generally achieve better accuracy), the discretization method is generally preferred over the distribution method.

3. 计算成为每个类的概率,取概率最大的类;

ML | Naive Bayes的更多相关文章

  1. [ML] Naive Bayes for Text Classification

    TF-IDF Algorithm From http://www.ruanyifeng.com/blog/2013/03/tf-idf.html Chapter 1, 知道了"词频" ...

  2. [ML] Naive Bayes for email classification

    20 Newsgroups (Original) Author: Jeffrey H 1. Introduction This is only a test report for naive baye ...

  3. [Scikit-learn] 1.9 Naive Bayes

    Ref: http://scikit-learn.org/stable/modules/naive_bayes.html 1.9.1. Gaussian Naive Bayes 原理可参考:统计学习笔 ...

  4. Naive Bayes Theorem and Application - Theorem

    Naive Bayes Theorm And Application - Theorem Naive Bayes model: 1. Naive Bayes model 2. model: discr ...

  5. 【十大算法实现之naive bayes】朴素贝叶斯算法之文本分类算法的理解与实现

    关于bayes的基础知识,请参考: 基于朴素贝叶斯分类器的文本聚类算法 (上) http://www.cnblogs.com/phinecos/archive/2008/10/21/1315948.h ...

  6. MLLib实践Naive Bayes

    引言 本文基于Spark (1.5.0) ml库提供的pipeline完整地实践一次文本分类.pipeline将串联单词分割(tokenize).单词频数统计(TF),特征向量计算(TF-IDF),朴 ...

  7. 基于Naive Bayes算法的文本分类

    理论 什么是朴素贝叶斯算法? 朴素贝叶斯分类器是一种基于贝叶斯定理的弱分类器,所有朴素贝叶斯分类器都假定样本每个特征与其他特征都不相关.举个例子,如果一种水果其具有红,圆,直径大概3英寸等特征,该水果 ...

  8. 机器学习---用python实现朴素贝叶斯算法(Machine Learning Naive Bayes Algorithm Application)

    在<机器学习---朴素贝叶斯分类器(Machine Learning Naive Bayes Classifier)>一文中,我们介绍了朴素贝叶斯分类器的原理.现在,让我们来实践一下. 在 ...

  9. [Machine Learning & Algorithm] 朴素贝叶斯算法(Naive Bayes)

    生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本文介绍朴素贝叶斯分类器(Naive Bayes classifier),它是一种简单有效的常用分类算法. 一.病人分类的例子 让我从一个例子 ...

随机推荐

  1. 将php数组转js数组,js如何接收PHP数组,json的用法

    首先下载下面这个文件(这是一段是别人写出来专门解析json的代码),然后引入这个文件! http://pan.baidu.com/s/1dD8qVr7 现在当我们需要用ajax与后台进行交互时,怎样将 ...

  2. 【HIHOCODER 1055】 刷油漆(树上背包)

    描述 小Ho有着一棵灰常好玩的树玩具!这棵树玩具是由N个小球和N-1根木棍拼凑而成,这N个小球都被小Ho标上了不同的数字,并且这些数字都是处于1..N的范围之内,每根木棍都连接着两个不同的小球,并且保 ...

  3. 打造一款属于自己的web服务器——从简单开始

    距离开篇已经过了很久,期间完善了一下之前的版本,目前已经能够完好运行,基本上该有的功能都有了,此外将原来的测试程序改为示例项目,新项目只需按照示例项目结构实现controller和view即可,详情见 ...

  4. python基础学习笔记——循环语句(while、for)

    while 循环 流程控制语句 while 1.基本循环 while 条件: # 循环体 # 如果条件为真,那么循环则执行 # 如果条件为假,那么循环不执行   2.break break 用于退出当 ...

  5. MongoDB学习-->设置通用的自增ID替代ObjectId

    插入mongodb数据时,会为其分配一个随机id,想要设置通用的自增id,可以进行以下操作 1.创建自增序列 package com.tangzhe.autoid; import lombok.Dat ...

  6. java内存模型学习

    根据 JVM 规范,JVM 内存共分为虚拟机栈.堆.方法区.程序计数器.本地方法栈五个部分. 虚拟机的内存模型分为两部分:一部分是线程共享的,包括 Java 堆和方法区:另一部分是线程私有的,包括虚拟 ...

  7. webdriver高级应用- 操作富文本框

    富文本框的技术实现和普通的文本框的定位存在较大的区别,富文本框的常见技术用到了Frame标签,并且在Frame里面实现了一个完整的HTML网页结构,所以使用普通的定位模式将无法直接定位到富文本框对象. ...

  8. jQuery 遍历函数 ,javascript中的each遍历

    jQuery 遍历函数 jQuery 遍历函数包括了用于筛选.查找和串联元素的方法. 函数 描述 .add() 将元素添加到匹配元素的集合中. .andSelf() 把堆栈中之前的元素集添加到当前集合 ...

  9. 浅析 Node.js 的 vm 模块以及运行不信任代码

    在一些系统中,我们希望给用户提供插入自定义逻辑的能力,除了 RPC 和 REST 之外,运行客户提供的代码也是比较常用的方法,好处是可以极大地减少在网络上的耗时.JavaScript 是一种非常流行而 ...

  10. AtCoder Grand Contest 021

    A - Digit Sum 2 Time limit : 2sec / Memory limit : 256MB Score : 300 points Problem Statement Find t ...