Naive Bayes是比较常用的分类器,因为思想比较简单。之所以说是naive,是因为他假设用于分类的特征在类确定的条件下是条件独立的,这个假设使得分类变得很简单,但会损失一定的精度。
具体推导可以看《统计学习方法》
经过推导我们可知y=argMaxP(Y=ck)*P(X=x|Y=ck)。那么我们需要求先验概率也就是P(Y=ck)和求条件概率p(X=x|Y=ck).
具体的例子以:http://blog.163.com/jiayouweijiewj@126/blog/static/1712321772010102802635243/来说明。
我这里一共用了4个mapreduce,因为采用了多项式模型,先验概率P(c)= 类c下单词总数/整个训练样本的单词总数。类条件概率P(tk|c)=(类c下单词tk在各个文档中出现过的次数之和+1)/(类c下单词总数+|V|)(|V|是单词种类数)。输入是:
1:Chinese Beijing Chinese
1:Chinese Chinese Shanghai
1:Chinese Macao
0:Tokyo Japan Chinese
1 一个mapreduce是用于求在各个类别下的单词数,这个是为了后面求先验概率用的。
输出为:
0              3
1              8
 
2 一个mapreduce用于求条件概率,输出为:
0:Chinese               0.2222222222222222
0:Japan   0.2222222222222222
0:Tokyo 0.2222222222222222
1:Beijing 0.14285714285714285
1:Chinese               0.42857142857142855
1:Macao 0.14285714285714285
1:Shanghai            0.14285714285714285
 
3 一个mapreduce用于计算单词种类数,输出为:
num is 6
4 最后一个mapreduce是用于预测的。
 
下面说下各个mapreduce的实现:
1 求各个类别下的单词数,这个比较简单,就是以类别为key,然后进行单词统计就好。
 附上代码:
 package hadoop.MachineLearning.Bayes.Pro;

 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class PriorProbability {//用于求各个类别下的单词数,为后面求先验概率 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input/";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro/";
Job job = Job.getInstance(conf, "ProirProbability");
job.setJarByClass(hadoop.MachineLearning.Bayes.Pro.PriorProbability.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
//job.setMapInputKeyClass(LongWritable.class);
// TODO: specify a reducer
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Pro; import java.io.IOException; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
int size=line.length-1;
context.write(new Text(line[0]),new Text(String.valueOf(size)));
} } package hadoop.MachineLearning.Bayes.Pro; import java.io.IOException; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context; public class MyReducer extends Reducer<Text, Text, Text, IntWritable> { public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
int sum=0;
for (Text val : values) {
sum+=Integer.parseInt(val.toString());
}
context.write(_key,new IntWritable(sum));
} }
 
 
2 求文档中的单词种类数,自己实现的方法不太好,思路是,对每一行的输入都以相同的key输出,然后在combiner中先利用set求得该节点上的不重复的单词,接着在reduce中再利用set,将所有单词求种类数。感觉好一点的话是先按照单词进行规约,最后再利用一个mapreduce对单词种类数进行统计。但是考虑到刚学会mapreduce不久还不会写链式,而且一个bayes已经写了4个mapreduce就不考虑再复杂化了。

package hadoop.MachineLearning.Bayes.Count;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Count {//计算文档中的单词种类数目 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "Count");
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";
job.setJarByClass(hadoop.MachineLearning.Bayes.Count.Count.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
// TODO: specify a reducer
job.setCombinerClass(MyCombiner.class);
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
String key="1";
System.out.println(" ");
System.out.println(" ");
System.out.println(" ");
for(int i=1;i<line.length;i++){ System.out.println(line[i]);
context.write(new Text(key),new Text(line[i]));//以相同的key进行输出,使得能最后输出到一个reduce中
}
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyCombiner extends Reducer<Text, Text, Text, Text> {//先在本地的节点上利用set删去重复的单词 public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
Set set=new HashSet();
for (Text val : values) {
set.add(val.toString());
}
for(Iterator it=set.iterator();it.hasNext();){
context.write(new Text("1"),new Text(it.next().toString()));
}
} } package hadoop.MachineLearning.Bayes.Count; import java.io.IOException;
import java.util.HashSet;
import java.util.Set; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, Text, Text, Text> {//通过combiner后,再利用set对单词进行去重,最后得到种类数 public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
Set set=new HashSet();
for (Text val : values) {
set.add(val.toString());
}
context.write(new Text("num is "),new Text(String.valueOf(set.size())));
} }
 
 
3 求条件概率.这里需要用到该类别下该单词的数目sum,该类别下的单词总数,文档中的单词种类数。这些都可以在之前的输出文件中获得,我这里都用map去接受这些数据。由于有些单词没有出现在该类别下,例如P(Japan | yes)=P(Tokyo | yes),如果将他们当作0处理,那么导致该条件概率会是0,所以这里用了平滑的方法可以参考上述的链接。这里有个细节,就是条件概率生成的会比较多,需要一种高效的存储和查找方式,我这里因为水平不够,就直接用map来存放了,如果对于大的数据,这个会很低效。

package hadoop.MachineLearning.Bayes.Cond;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class CondiPro {//用于求条件概率 public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Bayes_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Con";
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";//这是之前求各个类别下单词数目的输出
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";//这是之前求的单词种类数
conf.set("propath",proPath);
conf.set("countPath",countPath);
Job job = Job.getInstance(conf, "ConditionPro"); job.setJarByClass(hadoop.MachineLearning.Bayes.Cond.CondiPro.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
// TODO: specify a reducer
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
String[] line=ivalue.toString().split(":| ");
for(int i=1;i<line.length;i++){
String key=line[0]+":"+line[i];
context.write(new Text(key),new IntWritable(1));
}
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, IntWritable, Text, DoubleWritable> {
public Map<String,Integer> map;
public int count=0;
public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration(); String proPath=conf.get("propath");
String countPath=conf.get("countPath");//
map=Utils.getMapFormHDFS(proPath);//获得各个类别下的单词数
count=Utils.getCountFromHDFS(countPath);//获得单词种类数
}
public void reduce(Text _key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
// process values
int sum=0;
for (IntWritable val : values) {
sum+=val.get();
}
int type=Integer.parseInt(_key.toString().split(":")[0]);
double probability=0.0;
for(Map.Entry<String,Integer> entry:map.entrySet()){
if(type==Integer.parseInt(entry.getKey())){
probability=(sum+1)*1.0/(entry.getValue()+count);//条件概率的计算
}
}
context.write(_key,new DoubleWritable(probability));
} } package hadoop.MachineLearning.Bayes.Cond; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.LineReader; public class Utils { /**
* @param args
* @throws IOException
*/ public static Map<String,Integer> getMapFormHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Integer> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Integer.parseInt(temp[1]));
}
reader.close();
}
} return map; } public static Map<String,Double> getMapFormHDFS(String input,boolean j) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Double> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Double.parseDouble(temp[1]));
}
reader.close();
}
} return map; } public static int getCountFromHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path); int count=0;
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
count=Integer.parseInt(temp[1]);
}
reader.close();
}
}
return count;
} public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count/";
Map<String,Integer> map=Utils.getMapFormHDFS(proPath);
for(Map.Entry<String,Integer> entry:map.entrySet()){
System.out.println(entry.getKey()+"->"+entry.getValue());
} int count=Utils.getCountFromHDFS(countPath);
System.out.println("count is "+count);
} }
 
4 预测,例如输入Chinese, Chinese, Chinese, Tokyo, Japan。那就分别对每个单词以0,1的类别进行输出,输出为type:words,接着就是在条件概率中查找,进行简单的累乘即可。

package hadoop.MachineLearning.Bayes.Predict;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class Predict { public static void main(String[] args) throws Exception {//预测
Configuration conf = new Configuration();
String input="hdfs://10.107.8.110:9000/Bayes/Predict_input";
String output="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Predict";
String condiProPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Con";
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count";
conf.set("condiProPath",condiProPath);
conf.set("proPath",proPath);
conf.set("countPath",countPath);
Job job = Job.getInstance(conf, "Predict");
job.setJarByClass(hadoop.MachineLearning.Bayes.Predict.Predict.class);
// TODO: specify a mapper
job.setMapperClass(MyMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
// TODO: specify a reducer
job.setReducerClass(MyReducer.class); // TODO: specify output types
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class); // TODO: specify input and output DIRECTORIES (not files)
FileInputFormat.setInputPaths(job, new Path(input));
FileOutputFormat.setOutputPath(job, new Path(output)); if (!job.waitForCompletion(true))
return;
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper; public class MyMapper extends Mapper<LongWritable, Text, Text, Text> { public Map<String,Integer> map=new HashMap(); public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration();
String proPath=conf.get("proPath");
map=Utils.getMapFormHDFS(proPath);
} public void map(LongWritable ikey, Text ivalue, Context context)
throws IOException, InterruptedException {
for(Map.Entry<String,Integer> entry:map.entrySet()){
context.write(new Text(entry.getKey()),ivalue);//对每一行数据,打上所有类别,方便后续的求条件概率
}
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer; public class MyReducer extends Reducer<Text, Text, Text, DoubleWritable> { public Map<String,Double> mapDouble=new HashMap();//存放条件概率 public Map<String,Integer> mapInteger=new HashMap();//存放各个类别下的单词数 public Map<String,Double> noFind=new HashMap();//用于那些单词没有出现在某个类别中的 public Map<String,Double> prePro=new HashMap();//求的后的先验概率 public void setup(Context context) throws IOException{
Configuration conf=context.getConfiguration(); String condiProPath=conf.get("condiProPath");
String proPath=conf.get("proPath");
String countPath=conf.get("countPath");
mapDouble=Utils.getMapFormHDFS(condiProPath,true);
mapInteger=Utils.getMapFormHDFS(proPath);
int count=Utils.getCountFromHDFS(countPath);
for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
double pro=0.0;
noFind.put(entry.getKey(),(1.0/(count+entry.getValue())));
}
int sum=0;
for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
sum+=entry.getValue();
} for(Map.Entry<String,Integer> entry:mapInteger.entrySet()){
prePro.put(entry.getKey(),(entry.getValue()*1.0/sum));
} } public void reduce(Text _key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
// process values
String type=_key.toString();
double pro=1.0;
for (Text val : values) {
String[] words=val.toString().split(" ");
for(int i=0;i<words.length;i++){
String condi=type+":"+words[i];
if(mapDouble.get(condi)!=null){//如果该单词出现在该类别中,说明有条件概率
pro=pro*mapDouble.get(condi);
}else{//如果该单词不在该类别中,就采用默认的条件概率
pro=pro*noFind.get(type);
}
}
}
pro=pro*prePro.get(type);
context.write(new Text(type),new DoubleWritable(pro));
} } package hadoop.MachineLearning.Bayes.Predict; import java.io.IOException;
import java.util.HashMap;
import java.util.Map; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.LineReader; public class Utils { /**
* @param args
* @throws IOException
*/ public static Map<String,Integer> getMapFormHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Integer> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Integer.parseInt(temp[1]));
}
reader.close();
}
} return map; } public static Map<String,Double> getMapFormHDFS(String input,boolean j) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path);
Map<String,Double> map=new HashMap();
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
map.put(temp[0],Double.parseDouble(temp[1]));
}
reader.close();
}
} return map; } public static int getCountFromHDFS(String input) throws IOException{
Configuration conf=new Configuration();
Path path=new Path(input);
FileSystem fs=path.getFileSystem(conf); FileStatus[] stats=fs.listStatus(path); int count=0;
for(int i=0;i<stats.length;i++){
if(stats[i].isFile()){
FSDataInputStream infs=fs.open(stats[i].getPath());
LineReader reader=new LineReader(infs,conf);
Text line=new Text();
while(reader.readLine(line)>0){
String[] temp=line.toString().split(" ");
//System.out.println(temp.length);
count=Integer.parseInt(temp[1]);
}
reader.close();
}
}
return count;
} public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String proPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Pro";
String countPath="hdfs://10.107.8.110:9000/Bayes/Bayes_output/Count/";
Map<String,Integer> map=Utils.getMapFormHDFS(proPath);
for(Map.Entry<String,Integer> entry:map.entrySet()){
System.out.println(entry.getKey()+"->"+entry.getValue());
} int count=Utils.getCountFromHDFS(countPath);
System.out.println("count is "+count);
} }
 
 
 
 
 
 
 

Naive Bayes在mapreduce上的实现的更多相关文章

  1. Naive Bayes在mapreduce上的实现(转)

    Naive Bayes在mapreduce上的实现 原文地址 http://www.cnblogs.com/sunrye/p/4553732.html Naive Bayes是比较常用的分类器,因为思 ...

  2. (转载)微软数据挖掘算法:Microsoft Naive Bayes 算法(3)

    介绍: Microsoft Naive Bayes 算法是一种基于贝叶斯定理的分类算法,可用于探索性和预测性建模. Naïve Bayes 名称中的 Naïve 一词派生自这样一个事实:该算法使用贝叶 ...

  3. [Machine Learning & Algorithm] 朴素贝叶斯算法(Naive Bayes)

    生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本文介绍朴素贝叶斯分类器(Naive Bayes classifier),它是一种简单有效的常用分类算法. 一.病人分类的例子 让我从一个例子 ...

  4. Spark MLlib 之 Naive Bayes

    1.前言: Naive Bayes(朴素贝叶斯)是一个简单的多类分类算法,该算法的前提是假设各特征之间是相互独立的.Naive Bayes 训练主要是为每一个特征,在给定的标签的条件下,计算每个特征在 ...

  5. Naive Bayes理论与实践

    Naive Bayes: 简单有效的常用分类算法,典型用途:垃圾邮件分类 假设:给定目标值时属性之间相互条件独立 同样,先验概率的贝叶斯估计是 优点: 1. 无监督学习的一种,实现简单,没有迭代,学习 ...

  6. [ML] Naive Bayes for Text Classification

    TF-IDF Algorithm From http://www.ruanyifeng.com/blog/2013/03/tf-idf.html Chapter 1, 知道了"词频" ...

  7. 朴素贝叶斯方法(Naive Bayes Method)

        朴素贝叶斯是一种很简单的分类方法,之所以称之为朴素,是因为它有着非常强的前提条件-其所有特征都是相互独立的,是一种典型的生成学习算法.所谓生成学习算法,是指由训练数据学习联合概率分布P(X,Y ...

  8. 朴素贝叶斯算法(Naive Bayes)

    朴素贝叶斯算法(Naive Bayes) 阅读目录 一.病人分类的例子 二.朴素贝叶斯分类器的公式 三.账号分类的例子 四.性别分类的例子 生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本 ...

  9. Naive Bayes (NB Model) 初识

    1,Bayes定理 P(A,B)=P(A|B)P(B); P(A,B)=P(B|A)P(A); P(A|B)=P(B|A)P(A)/P(B);    贝叶斯定理变形 2,概率图模型 2.1  定义 概 ...

随机推荐

  1. Immutable 详解及 React 中实践

    本文转自:https://github.com/camsong/blog/issues/3 Shared mutable state is the root of all evil(共享的可变状态是万 ...

  2. 安卓四大组件的作用、安卓Service的作用

    Activity好像是应用程式的眼睛,提供与user互动之窗. BroadcastReceiver好像是耳朵,接收来自各方的Intent. Service是在后台运行的. 一个Service 是一段长 ...

  3. PHP正则表达式试题

    1.POSIX正则表达式扩展在PHP哪个版本被废弃了 2.请写出匹配任意数字,任意空白字符,任意单词字符的符号? 3.执行一个正则表达式匹配的函数是什么?返回的结果有哪些? 4.执行一个全局正则表达式 ...

  4. SQL SERVER中强制类型转换cast和convert的区别

    在SQL SERVER中,cast和convert函数都可用于类型转换,其功能是相同的, 只是语法不同. cast一般更容易使用,convert的优点是可以格式化日期和数值. 代码 select CO ...

  5. 查找mysql数据库中所有包含特定名字的字段所在的表

    整个数据库查找 placement 字段: select * from INFORMATION_SCHEMA.columns where COLUMN_NAME Like '%placement%'; ...

  6. MFC笔记<持续更新>

    1.设置垂直滚动条的位置在末尾 SCROLLINFO si; GetScrollInfo(SB_VERT, &si, SIF_PAGE | SIF_RANGE | SIF_POS); si.f ...

  7. Android手机图片适配问题

    需求:今天在做ListView的时候遇到一个问题,就是ListView中加载图片的时候.有些图片的大小比较大,所以会出现图片显示不充分的问题. 首先,再不做任何处理的情况下,大小是这样的.宽度是Wra ...

  8. windows程序设计(四)

    对话框常用相关消息映射函数: 一.对话框初始化消息: 1.WM_CREATE:通用窗口初始化消息 窗口还未显示出来,只有父窗口,子窗口还没创建 2.WM_INITDIALOG:对话框窗口专用消息 子窗 ...

  9. mapreduce 依赖组合

    mport java.io.IOException;import java.util.StringTokenizer; import org.apache.hadoop.conf.Configurat ...

  10. 转:WebDriver进行屏幕截图

    例: 打开百度首页 ,进行截图 01 packagecom.example.tests;  02 importjava.io.File;  03 importorg.apache.commons.io ...