[Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
// train data
[{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train
var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier();
fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
});
function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
}
function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
}
function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}
In a new project, we can test the train result by:
var natural = require('natural');
natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});
[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章
- [Javascript] Classify text into categories with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 100 Most Popular Machine Learning Video Talks
100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45, Gaussian Process Basics, David ...
- [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig
机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
随机推荐
- P2633 Count on a tree(主席树)
题目描述 给定一棵N个节点的树,每个点有一个权值,对于M个询问(u,v,k),你需要回答u xor lastans和v这两个节点间第K小的点权.其中lastans是上一个询问的答案,初始为0,即第一个 ...
- Java的几个有用小Util函数(日期处理和http)
/** * 依据日期返回当前日期是一年的第几天 * @param date * @return */ public static int orderDa ...
- DOM基础及DOM操作HTML
文件对象模型(Document Object Model.简称fr=aladdin" target="_blank">DOM).是W3C组织推荐的处理可扩展标 ...
- 八款常用的 Python GUI 开发框架推荐
作为Python开发者,你迟早都会用到图形用户界面来开发应用.本文将推荐一些 Python GUI 框架,希望对你有所帮助,如果你有其他更好的选择,欢迎在评论区留言. Python 的 UI 开发工具 ...
- Docker Network Configuration 高级网络配置
Network Configuration TL;DR When Docker starts, it creates a virtual interface named docker0 on the ...
- Linux下清除系统日志方法
摘要:相信大家都是用过Windows的人.对于Windows下饱受诟病的各种垃圾文件都需要自己想办法删除,不然你的系统将会变得越来越大,越来越迟钝!windows怎么清理垃圾相信大家都知道的,那么li ...
- POJ 1293 网络流 第一题
完全的模板,做多了就好了吧 反向流量真的很有意思,有这样一种说法比较容易理解.”正向是+,反向就是-,其实是等价的.因为每次找到的增广路不一定是最优解里面的,所以再进行后面的操作的时候要重新选择,而反 ...
- IBM将收购Linux发行商红帽公司,继续发力云计算市场
10月29日凌晨消息,IBM和Red Hat当地时间星期日联合宣布,IBM将以340亿美元收购红帽公司(Red Hat).根据两家公司发表的一份联合声明,IBM将以每股190美元的价格,以现金方式收购 ...
- JAVA文件读取FileReader
JAVA文件读取FileReader 导包import java.io.FileReader 创建构造方法public FileReader(String filename),参数是文件的路径及文件名 ...
- 51nod 正整数分组
将一堆正整数分为2组,要求2组的和相差最小. 显然我们可以把所有可能组合成的数求出来. 然后从总和的中间开始往大找,找到了就是其中一个的分组,就可以求出答案了. #include<cstdio& ...