In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.

While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.

The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.

This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.

// train data

[{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train

var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier(); fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
}); function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
} function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
} function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
   saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}

In a new project, we can test the train result by:

var natural = require('natural');

natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});

[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章

  1. [Javascript] Classify text into categories with machine learning in Natural

    In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...

  2. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  3. Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)

    week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...

  4. 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)

    下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...

  5. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  6. 100 Most Popular Machine Learning Video Talks

    100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45,  Gaussian Process Basics, David ...

  7. [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig

    机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...

  8. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  9. Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)

    http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...

随机推荐

  1. bug14052601

    AppDelegate.obj : error LNK2019: 无法解析的外部符号 "public: __thiscall cocos2d::ui::Margin::Margin(void ...

  2. ImageLoader的简单分析(二)

    在<ImageLoader的简单分析>这篇博客中对IImageLoader三大组件的创建过程以及三者之间的关系做了说明.同一时候文章的最后也简单的说明了一下ImageLoader是怎么通过 ...

  3. java中hashmap和hashtable和hashset的区别

    hastTable和hashMap的区别:(1)Hashtable是基于陈旧的Dictionary类的,HashMap是Java 1.2引进的Map接口的一个实现.(2)这个不同即是最重要的一点:Ha ...

  4. 基于ArcGIS Flex API实现动态标绘(1.2)

    动态标绘API 1.2,相较前一版本号(点击进入),该版本号新增对基本标绘符号的支持,包含: 单点.多点.折线.手绘线.多边形.手绘多边形.矩形,并提供对应的编辑功能. 例如以下图所看到的,对多点的编 ...

  5. [B cannot be cast to java.lang.String

    sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.inv ...

  6. nginx 11个过程

    nginx在处理每一个用户请求时,都是按照若干个不同的阶段依次处理的,与配置文件上的顺序没有关系,详细内容可以阅读<深入理解nginx:模块开发与架构解析>这本书,这里只做简单介绍: 1. ...

  7. 酱油记:GDKOI2018

    GDKOI2018,走出机房的第六场考试 DAY0 这一次GDKOI,第一次在广州二中考,第一次住在柏高酒店(住宿条件杠杠的!),晚上就到对面的万达广场吃了顿烤肉,到老师那里开会,然后就回酒店睡了 D ...

  8. 35.QQ大数据模型

    #define _CRT_SECURE_NO_WARNINGS #include <stdio.h> #include <stdlib.h> #include <stri ...

  9. vue.js路由vue-router(一)——简单路由基础

    前言 vue.js除了拥有组件开发体系之外,还有自己的路由vue-router.在没有使用路由之前,我们页面的跳转要么是后台进行管控,要么是用a标签写链接.使用vue-router后,我们可以自己定义 ...

  10. Mysql source导入.sql文件深坑!

    刚刚接手一个项目,给老系统加功能.把数据库考出来一个.sql文件就170多M. 使用mysql命令行source 我的.sql文件. 导了一宿都没导完,然后发现里面的数据怎么是乱码呢.. 崩溃额,在排 ...