[Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
// train data [{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier(); fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
}); function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
} function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
} function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}
In a new project, we can test the train result by:
var natural = require('natural'); natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});
[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章
- [Javascript] Classify text into categories with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 100 Most Popular Machine Learning Video Talks
100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45, Gaussian Process Basics, David ...
- [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig
机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
随机推荐
- [洛谷P1343]地震逃生
题目大意:有n个点m条单向边,每条边有一个容量.现有x人要分批从1走到n,问每批最多能走多少人,分几批运完(或输出无法运完). 解题思路:一看就是网络流的题目.每批最多能走多少人,即最大流.分几批运完 ...
- linux下搭建NFS服务器
服务端:10.6.191.183 客户端:10.6.191.182 NFS 是Network File System的缩写,即网络文件系统.一种使用于分散式文件系统的协定,由Sun公司开发,于1984 ...
- U-boot 启动内核
1:什么是UBOOT,为什么要有UBOOT? UBOOT的主要作用是用来启动linux内核,因为CPU不能直接从块设备中执行代码,需要把块设备中的程序复制到内存中,而复制之前还需要进行很多初始化工作, ...
- hbase源码系列(十二)Get、Scan在服务端是如何处理
hbase源码系列(十二)Get.Scan在服务端是如何处理? 继上一篇讲了Put和Delete之后,这一篇我们讲Get和Scan, 因为我发现这两个操作几乎是一样的过程,就像之前的Put和Del ...
- python list的+,+=,append,extend
面试题之中的一个. def func1(p): p = p + [1] def func2(p): p += [1] p1 = [1,2,3] p2 = [1,2,3] func1(p1) func2 ...
- hdu1280 前m大的数(数组下标排序)
前m大的数 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Others) Total Subm ...
- 浏览器Console创建canvas base64 png图片
火狐中运行:console.log var canvas = document.createElement('canvas'); canvas.width =1 canvas.height =1 ca ...
- 简单缓存Cache
接口 interface ICache { /// <summary> /// 添加 /// </summary> /// <param name="key&q ...
- 有关error PRJ0003错误的思考
作者:朱金灿 来源:http://blog.csdn.net/clever101 今天同事遇到两个编译错误: 项目: error PRJ0003 : 生成"rc.exe"时出错. ...
- Network Stack
Network Stack 目录 1 Overview 2 Code Layout 3 Anatomy of a Network Request (focused on HTTP) 3.1 URLRe ...