[Javascript] Classify JSON text data with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.
While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.
The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.
This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.
// train data
[{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train
var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier();
fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
});
function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
}
function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
}
function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}
In a new project, we can test the train result by:
var natural = require('natural');
natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});
[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章
- [Javascript] Classify text into categories with machine learning in Natural
In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)
Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...
- Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)
week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...
- 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)
下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...
- How do I learn machine learning?
https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644 How Can I Learn X? ...
- 100 Most Popular Machine Learning Video Talks
100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45, Gaussian Process Basics, David ...
- [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig
机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...
- [C2P3] Andrew Ng - Machine Learning
##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...
- Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)
http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...
随机推荐
- [POI2010]KLO-Blocks(单调栈)
题意 给出N个正整数a[1..N],再给出一个正整数k,现在可以进行如下操作:每次选择一个大于k的正整数a[i],将a[i]减去1,选择a[i-1]或a[i+1]中的一个加上1.经过一定次数的操作后, ...
- 关于post请求“CAUTION: Provisional headers are shown”【转】
在POST请求中偶尔会出现"CAUTION: Provisional headers are shown" 这个警告的意思是说:请求的资源可能会被(扩展/或其它什么机制)屏蔽掉. ...
- centos7;windows下安装和使用spice
感谢朋友支持本博客,欢迎共同探讨交流,因为能力和时间有限,错误之处在所难免,欢迎指正! 假设转载,请保留作者信息. 博客地址:http://blog.csdn.net/qq_21398167 原博文地 ...
- TRIZ系列-创新原理-7-嵌套原理
原理表述例如以下: 1)把一个物体嵌入另外一个物体.然后将这两个物体再嵌入第三个物体,以此类推. 这个原理又叫俄罗斯娃原理,目的是在不影响原有功能的情况下: A) 在须要时.能够降低系统的体积和便于携 ...
- windows server,无桌面服务器 , 批处理更改时区
windows server,无桌面服务器 , 批处理更改时区 time /t cmd.exe /c Control.exe TIMEDATE.CPL,,/Z "China Standard ...
- 线程池系列一:线程池作用及Executors方法讲解
线程池的作用: 线程池作用就是限制系统中执行线程的数量. 根据系统的环境情况,可以自动或手动设置线程数量,达到运行的最佳效果:少了浪费了系统资源,多了造成系统拥挤效率不高.用线程池控制线程数量 ...
- 80.简单搭建nodeJS服务,访问本地站点文件
转自:https://blog.csdn.net/iteye_1217/article/details/82679843 搭建nodejs服务器步骤: 1.安装nodejs服务(从官网下载安装),no ...
- MySQL自定义函数(四十六)
MySQL自定义函数 一.什么是MYSQL自定义函数? mysql当中的自定义函数,我们简称为UDF,它实际上是一种对MySQL扩展的途径,其用法与内置函数相同. 二.自定义函数应该具备哪些条件? 我 ...
- 机器学习(三) Jupyter Notebook, numpy和matplotlib的详细使用 (下)
七.Numpy中的矩阵运算 八.Numpy中的聚合运算 九.Numpy中的arg运算 十.Numpy中的比较和Fancy Indexing 十一.Matplotlib数据可视化基础 十二.数据加载和简 ...
- 使用js实现简单放大镜的效果
实现原理:使用2个div,里面分别放大图片和小图片,在小图片上应该还有一个遮罩层,通过定位遮罩层的位置来定位大图片的相对位置,而且,遮罩层的移动应该和大图片的移动方向相反 关键: 大图片和小图片大小比 ...