In this lesson, we will learn how to train a Naive Bayes classifier and a Logistic Regression classifier - basic machine learning algorithms - on JSON text data, and classify it into categories.

While this dataset is still considered a small dataset -- only a couple hundred points of data -- we'll start to get better results.

The general rule is that Logistic Regression will work better than Naive Bayes, but only if there is enough data. Since this is still a pretty small dataset, Naive Bayes works better here. Generally, Logistic Regression takes longer to train as well.

This uses data from Ana Cachopo: http://ana.cachopo.org/datasets-for-single-label-text-categorization.

// train data

[{text: 'xxxxxx', label: 'space'}]
// Load train data form the files and train

var natural = require('natural');
var fs = require('fs');
var classifier = new natural.BayesClassifier(); fs.readFile('training_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var trainingData = JSON.parse(data);
train(trainingData);
}
}); function train(trainingData){
console.log("Training");
trainingData.forEach(function(item){
classifier.addDocument(item.text, item.label);
});
var startTime = new Date();
classifier.train();
var endTime = new Date();
var trainingTime = (endTime-startTime)/1000.0;
console.log("Training time:", trainingTime, "seconds");
loadTestData();
} function loadTestData(){
console.log("Loading test data");
fs.readFile('test_data.json', 'utf-8', function(err, data){
if (err){
console.log(err);
} else {
var testData = JSON.parse(data);
testClassifier(testData);
}
});
} function testClassifier(testData){
console.log("Testing classifier");
var numCorrect = 0;
testData.forEach(function(item){
var labelGuess = classifier.classify(item.text);
if (labelGuess === item.label){
numCorrect++;
}
});
console.log("Correct %:", numCorrect/testData.length);
   saveClassifier(classifier)
}
function saveClassifier(classifier){
classifier.save('classifier.json', function(err, classifier){
if (err){
console.log(err);
} else {
console.log("Classifier saved!");
}
});
}

In a new project, we can test the train result by:

var natural = require('natural');

natural.LogisticRegressionClassifier.load('classifier.json', null, function(err, classifier){
if (err){
console.log(err);
} else {
var testComment = "is this about the sun and moon?";
console.log(classifier.classify(testComment));
}
});

[Javascript] Classify JSON text data with machine learning in Natural的更多相关文章

  1. [Javascript] Classify text into categories with machine learning in Natural

    In this lesson, we will learn how to train a Naive Bayes classifier or a Logistic Regression classif ...

  2. Coursera, Big Data 4, Machine Learning With Big Data (week 1/2)

    Week 1 Machine Learning with Big Data KNime - GUI based Spark MLlib - inside Spark CRISP-DM Week 2, ...

  3. Coursera, Big Data 4, Machine Learning With Big Data (week 3/4/5)

    week 3 Classification KNN :基本思想是 input value 类似,就可能是同一类的 Decision Tree Naive Bayes Week 4 Evaluating ...

  4. 斯坦福大学公开课机器学习:machine learning system design | data for machine learning(数据量很大时,学习算法表现比较好的原理)

    下图为四种不同算法应用在不同大小数据量时的表现,可以看出,随着数据量的增大,算法的表现趋于接近.即不管多么糟糕的算法,数据量非常大的时候,算法表现也可以很好. 数据量很大时,学习算法表现比较好的原理: ...

  5. How do I learn machine learning?

    https://www.quora.com/How-do-I-learn-machine-learning-1?redirected_qid=6578644   How Can I Learn X? ...

  6. 100 Most Popular Machine Learning Video Talks

    100 Most Popular Machine Learning Video Talks 26971 views, 1:00:45,  Gaussian Process Basics, David ...

  7. [C5/C6] 机器学习诊断和系统设计(Machine learning Diagnostic and System Desig

    机器学习诊断(Machine learning diagnostic) Diagnostic : A test that you can run to gain insight what is / i ...

  8. [C2P3] Andrew Ng - Machine Learning

    ##Advice for Applying Machine Learning Applying machine learning in practice is not always straightf ...

  9. Machine Learning - XI. Machine Learning System Design机器学习系统的设计(Week 6)

    http://blog.csdn.net/pipisorry/article/details/44119187 机器学习Machine Learning - Andrew NG courses学习笔记 ...

随机推荐

  1. zabbix4.0 使用nginx前端安装

    注:环境需求:centos7 1.安装阿里云yum源: rpm -ivh https://mirrors.aliyun.com/zabbix/zabbix/4.1/rhel/7/x86_64/zabb ...

  2. vue使用,问题

    参考链接:https://cn.vuejs.org/v2/guide/index.html *)[Vue warn]: Error in v-on handler: "TypeError: ...

  3. 洛谷 P1604 B进制星球

    P1604 B进制星球 题目背景 进制题目,而且还是个计算器~~ 题目描述 话说有一天,小Z乘坐宇宙飞船,飞到一个美丽的星球.因为历史的原因,科技在这个美丽的星球上并不很发达,星球上人们普遍采用B(2 ...

  4. 局部特化 & 特化

    注意,显式特化不是一个模板.如果是类型跟显式特化一样,那么不是实例化. 显式特化类的函数,不需要再加template,因为不是模板方法. 特化类的函数跟模板类不一定要一样,但是一样更好. 不支持局部特 ...

  5. POJ3904 Sky Code【容斥原理】

    题目链接: http://poj.org/problem?id=3904 题目大意: 给你N个整数.从这N个数中选择4个数,使得这四个数的公约数为1.求满足条件的 四元组个数. 解题思路: 四个数的公 ...

  6. Kinect for Windows V2 SDK+ VS2012 环境搭建

    眼下使用的SDK版本号是KinectSDK-v2.0-PublicPreview1409-Setup.exe. 下载地址:http://www.microsoft.com/en-us/download ...

  7. Java 8 时间日期库的20个使用演示样例

    除了lambda表达式,stream以及几个小的改进之外,Java 8还引入了一套全新的时间日期API,在本篇教程中我们将通过几个简单的任务演示样例来学习怎样使用Java 8的这套API.Java对日 ...

  8. CentOS用rpm升级glibc

    CentOS用rpm升级glibc #! /bin/sh # update glibc to 2.23 for CentOS 6 wget http://cbs.centos.org/kojifile ...

  9. JAVA并发-为现有的线程安全类添加原子方法

    JAVA中有许多线程安全的基础模块类,一般情况下,这些基础模块类能满足我们需要的所有操作,但更多时候,他们并不能满足我们所有的需要.此时,我们需要想办法在不破坏已有的线程安全类的基础上添加一个新的原子 ...

  10. LSTM入门学习——本质上就是比RNN的隐藏层公式稍微复杂了一点点而已

    LSTM入门学习 摘自:http://blog.csdn.net/hjimce/article/details/51234311 下面先给出LSTM的网络结构图: 看到网络结构图好像很复杂的样子,其实 ...