Natural Language Processing with Python

Chapter 6.1

由于nltk.FreqDist的排序问题,获取电影文本特征词的代码有些微改动。

 import nltk
from nltk.corpus import movie_reviews as mr def document_features(document,words_features):
document_words=set(document)
features={}
for word in words_features:
features['has(%s)' %word] = (word in document_words)
return features def test_doc_classification():
documents=[(list(mr.words(fileid)),category)
for category in mr.categories()
for fileid in mr.fileids(categories=category)]
all_words_dist=nltk.FreqDist(w.lower() for w in mr.words())
words_freq =sorted(all_words_dist.items(), key=lambda x: (-1*x[1], x[0]))[:2000]
words_features=[word[0] for word in words_freq] featuresets=[(document_features(doc,words_features),c) for (doc,c) in
documents] train_set, test_set= featuresets[100:],featuresets[:100]
classifier=nltk.NaiveBayesClassifier.train(train_set) print nltk.classify.accuracy(classifier,test_set) classifier.show_most_informative_features(5)

结果如下,accuracy为0.86:

0.86
Most Informative Features
has(outstanding) = True pos : neg = 10.4 : 1.0
has(seagal) = True neg : pos = 8.7 : 1.0
has(mulan) = True pos : neg = 8.1 : 1.0
has(wonderfully) = True pos : neg = 6.3 : 1.0
has(damon) = True pos : neg = 5.7 : 1.0

Document Classification的更多相关文章

  1. Support Vector Machines for classification

    Support Vector Machines for classification To whet your appetite for support vector machines, here’s ...

  2. Classification of text documents: using a MLComp dataset

    注:原文代码链接http://scikit-learn.org/stable/auto_examples/text/mlcomp_sparse_document_classification.html ...

  3. [Tensorflow] RNN - 04. Work with CNN for Text Classification

    Ref: Combining CNN and RNN for spoken language identification Ref: Convolutional Methods for Text [1 ...

  4. 论文列表——text classification

    https://blog.csdn.net/BitCs_zt/article/details/82938086 列出自己阅读的text classification论文的列表,以后有时间再整理相应的笔 ...

  5. Link-based Classification相关数据集

    Link-based Classification相关数据集 Datasets Document Classification Datasets: CiteSeer: The CiteSeer dat ...

  6. #论文阅读# Universial language model fine-tuing for text classification

    论文链接:https://aclweb.org/anthology/P18-1031 对文章内容的总结 文章研究了一些在general corous上pretrain LM,然后把得到的model t ...

  7. Text Classification

    Text Classification For purpose of word embedding extrinsic evaluation, especially downstream task. ...

  8. Machine Learning Algorithms Study Notes(2)--Supervised Learning

    Machine Learning Algorithms Study Notes 高雪松 @雪松Cedro Microsoft MVP 本系列文章是Andrew Ng 在斯坦福的机器学习课程 CS 22 ...

  9. Similarity-based Learning

    Similarity-based approaches to machine learning come from the idea that the best way to make a predi ...

随机推荐

  1. iOS 10推送通知开发

    原文地址:Developing Push Notifications for iOS 10,译者:李剑飞 虽然通知经常被过度使用,但是通知确实是一种获得用户关注和通知他们需要更新或行动的有效方式.iO ...

  2. 'customerService' for bean class [com.cd.service.business.customer.impl.CustomerService]

    'customerService' for bean class [com.cd.service.business.customer.impl.CustomerService] 此处报错是因为我 把一 ...

  3. 由浅到深理解java反射

    1.基础概念 class类: 1.1java是面向对象的,但是在java中存在两种东西不是面向对象的 一种是普通的数据类型,这也是封装数据类存在的原因. 二种是静态静态成员. 1.2所以我们首先要理解 ...

  4. EGL接口介绍-----Android OpenGL ES底层开发

    引自:http://www.cnitblog.com/zouzheng/archive/2011/05/30/74326.html EGL 是 OpenGL ES 和底层 Native 平台视窗系统之 ...

  5. elasticsearch 性能调优

    所有的修改都可以在elasticsearch.yml里面修改,也可以通过api来修改.推荐用api比较灵活 1.不同分片之间的数据同步是一个很大的花费,默认是1s同步,如果我们不要求实时性,我们可以执 ...

  6. Ubuntu 9.10+ apache2.2 +Django的配置

    1.首先安装mod_python apt-get install libapache2-mod-python2.6 (Ubuntu 9.10默认安装的是python 2.6版,如果是2.5可改为 li ...

  7. 微信小程序页面-页面跳转失败WAService.js:3 navigateTo:fail url not in app.json

    微信小程序新建页面的要素一是新建的文件名称和其子文件的名称最好一致,不然容易出问题,在小程序页面跳转中如果出现WAService.js:3 navigateTo:fail url not in app ...

  8. foo bar的意思

    有些朋友问:foo, bar是什么意思, 为什么C++书籍中老见到这个词. 我google了一下, 发现没有很好的中文答案.这个问题,在维基百科上有很好的回答.在这里翻译给大家. 译文: 术语foob ...

  9. WebSphere MQ 入门指南【转】

    WebSphere MQ 入门指南 转自 WebSphere MQ 入门指南 - 大CC - 博客园http://www.cnblogs.com/me115/p/3456407.html 这是一篇入门 ...

  10. android4.0蓝牙使能的详细解析(转)

    源:http://www.cnblogs.com/xiaochao1234/p/3818193.html 本文详细分析了android4.0 中蓝牙使能的过程,相比较android2.3,4.0中的蓝 ...