Natural Language Processing with Python

Chapter 6.1

由于nltk.FreqDist的排序问题,获取电影文本特征词的代码有些微改动。

 import nltk
from nltk.corpus import movie_reviews as mr def document_features(document,words_features):
document_words=set(document)
features={}
for word in words_features:
features['has(%s)' %word] = (word in document_words)
return features def test_doc_classification():
documents=[(list(mr.words(fileid)),category)
for category in mr.categories()
for fileid in mr.fileids(categories=category)]
all_words_dist=nltk.FreqDist(w.lower() for w in mr.words())
words_freq =sorted(all_words_dist.items(), key=lambda x: (-1*x[1], x[0]))[:2000]
words_features=[word[0] for word in words_freq] featuresets=[(document_features(doc,words_features),c) for (doc,c) in
documents] train_set, test_set= featuresets[100:],featuresets[:100]
classifier=nltk.NaiveBayesClassifier.train(train_set) print nltk.classify.accuracy(classifier,test_set) classifier.show_most_informative_features(5)

结果如下,accuracy为0.86:

0.86
Most Informative Features
has(outstanding) = True pos : neg = 10.4 : 1.0
has(seagal) = True neg : pos = 8.7 : 1.0
has(mulan) = True pos : neg = 8.1 : 1.0
has(wonderfully) = True pos : neg = 6.3 : 1.0
has(damon) = True pos : neg = 5.7 : 1.0

Document Classification的更多相关文章

  1. Support Vector Machines for classification

    Support Vector Machines for classification To whet your appetite for support vector machines, here’s ...

  2. Classification of text documents: using a MLComp dataset

    注:原文代码链接http://scikit-learn.org/stable/auto_examples/text/mlcomp_sparse_document_classification.html ...

  3. [Tensorflow] RNN - 04. Work with CNN for Text Classification

    Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...

  4. 论文列表——text classification

    https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

  5. Link-based Classification相关数据集

    Link-based Classification相关数据集 Datasets Document Classification Datasets: CiteSeer: The CiteSeer dat ...

  6. #论文阅读# Universial language model fine-tuing for text classification

    论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...

  7. Text Classification

    Text Classification For purpose of word embedding extrinsic evaluation, especially downstream task. ...

  8. Machine Learning Algorithms Study Notes(2)--Supervised Learning

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  9. Similarity-based Learning

    Similarity-based approaches to machine learning come from the idea that the best way to make a predi ...

随机推荐

  1. 读《Ext.JS.4.First.Look》随笔

    Ext JS 4是最大的改革已经取得了Ext框架.这些变化包括一个新类系统,引入一个新的平台,许多API变化和改进,和新组件,如新图表和新画组件.Ext JS 4是更快,更稳定,易于使用.(注意:Ex ...

  2. Ubuntu下安装Node.js

    下载Linux Binaries (.tar.gz)二进制包 解压 重命名为node 移动到/usr/local/目录下 创建软连接 ln -s /usr/local/node/bin/node   ...

  3. Windows API 之 CreateThread、WaitForSingleObject(未完)

    WaitForSingleObject Waits until the specified object is in the signaled state or the time-out interv ...

  4. Windows API 之 CreateFile

    Creates or opens a file or I/O device. The most commonly used I/O devices are as follows: file, file ...

  5. 【转】使IFRAME在iOS设备上支持滚动

    原文链接: Scroll IFRAMEs on iOS原文日期: 2014年07月02日 翻译日期: 2014年07月10日翻译人员: 铁锚 很长时间以来, iOS设备上Safari中超出边界的元素将 ...

  6. JSP标准标签库(JSTL)--SQL标签库 sql

    了解即可.SQL标签库 No. 功能分类 标签名称 描述 1 数据源标签 <sql:setDataSource> 设置要使用的数据源名称 2 数据库操作标签 <sql:query&g ...

  7. 黄聪:基于Asp.net的CMS系统We7架设实验(环境WIN7,SQL2005,.NET3.5)(初学者参考贴)

    http://www.cnblogs.com/huangcong/archive/2010/03/30/1700348.html

  8. UIView 面面观

    原创:转载请注明出处 1.UIView: 一个视图对象控制该区域的渲染,同时也控制内容的交互. 2.UIView的功能就是:展示.渲染.交互 3.UIView 和很多其他视图控件的默认tag值是0,所 ...

  9. Python 函数简介之一

    面向对象--->对象------>Class 面向过程--->过程------>def 函数式编程--->函数---->def 函数的介绍: def fun1(): ...

  10. Python3基础 定义无参数无返回值函数 调用会输出hello world的函数

    镇场诗: 诚听如来语,顿舍世间名与利.愿做地藏徒,广演是经阎浮提. 愿尽吾所学,成就一良心博客.愿诸后来人,重现智慧清净体.-------------------------------------- ...