Part of Speech Tagging

Natural Language Processing with Python

Charpter 6.1

suffix_fdist处代码稍微改动。

 import nltk

 from nltk.corpus import brown

 def common_suffixes_fun():

     suffix_fdist=nltk.FreqDist()

     for word in brown.words():

         word=word.lower()

         suffix_fdist[word[-1:]] +=1

         suffix_fdist[word[-2:]] +=1

         suffix_fdist[word[-3:]] +=1

     most_freqent_items=[it for it in sorted(suffix_fdist.items(),key=lambda x:(-x[1],x[0]))[:100]]

     return [su[0] for su in most_freqent_items]

 common_suffixes = common_suffixes_fun()

 def pos_features(word):

     features={}

     for su in common_suffixes:

         features['endswith(%s)' % su]=word.lower().endswith(su)

     return features

 def test_pos():

     tagged_words = brown.tagged_words(categories='news')[:5000]

     featuresets=[(pos_features(word),tag) for (word,tag) in tagged_words]

     size= int(len(tagged_words)*0.1)

     train_set, test_set = featuresets[size:],featuresets[:size]

     classifier=nltk.NaiveBayesClassifier.train(train_set)

     print nltk.classify.accuracy(classifier,test_set)

     classifier.show_most_informative_features(5)

运行结果为：

0.652
Most Informative Features
endswith(o) = True TO : NN = 423.2 : 1.0
endswith(es) = True DOZ : NN = 319.5 : 1.0
endswith(om) = True WPO : NN = 319.5 : 1.0
endswith(as) = True BEDZ : IN = 303.3 : 1.0
endswith(s) = True BEDZ : IN = 303.3 : 1.0

Part of Speech Tagging的更多相关文章

自然语言15.1_Part of Speech Tagging 词性标注
QQ:231469242 欢迎喜欢nltk朋友交流 https://en.wikipedia.org/wiki/Part-of-speech_tagging In corpus linguistics ...
自然语言15_Part of Speech Tagging with NLTK
https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...
词性标注 parts of speech tagging
In corpus linguistics, part-of-speech tagging (POS tagging or POST), also called grammatical tagging ...
常用python机器学习库总结
开始学习Python,之后渐渐成为我学习工作中的第一辅助脚本语言,虽然开发语言是Java,但平时的很多文本数据处理任务都交给了Python.这些年来,接触和使用了很多Python工具包,特别是在文本处 ...
自然语言14_Stemming words with NLTK
https://www.pythonprogramming.net/stemming-nltk-tutorial/?completed=/stop-words-nltk-tutorial/ # -*- ...
自然语言12_Tokenizing Words and Sentences with NLTK
https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...
大数据分析与机器学习领域Python兵器谱
http://www.thebigdata.cn/JieJueFangAn/13317.html 曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开发语言是C/ ...
ML 05、分类、标注与回归
机器学习算法原理.实现与实践 —— 分类.标注与回归 1. 分类问题分类问题是监督学习的一个核心问题.在监督学习中,当输出变量$Y$取有限个离散值时,预测问题便成为分类问题. 监督学习从数据中学习 ...
Python 网页爬虫 & 文本处理 & 科学计算 & 机器学习 & 数据挖掘兵器谱（转）
原文:http://www.52nlp.cn/python-网页爬虫-文本处理-科学计算-机器学习-数据挖掘曾经因为NLTK的缘故开始学习Python,之后渐渐成为我工作中的第一辅助脚本语言,虽然开 ...

随机推荐

rs.Open sql,conn,0,2,1
例子:rs.Open sql,conn,0,2,1 CursorType = 0,默认值,打开仅向前类型游标.LockType = 2, 开放式锁定Options = 1, 指示 ADO 生成 SQL ...
ANT 操控 ORACLE数据库实践
Ant 执行系统命令没有任何问题,这次实际系统命令中可以说遇到了两个问题,一个是启动服务的命令是含有空格的,第二个如何备份数据库可以自动加上日期. 首先,我们启动oracle数据库,操作有两个: 1. ...
【多重背包】 poj 2392
转自:http://blog.csdn.net/wangjian8006 题目大意:有一头奶牛要上太空,他有很多种石头,每种石头的高度是hi,但是不能放到ai之上的高度,并且这种石头有ci个将这些石头 ...
zzuli 1907: 小火山的宝藏收益邻接表+DFS
Time Limit: 1 Sec Memory Limit: 128 MBSubmit: 113 Solved: 24 SubmitStatusWeb Board Description ...
Element type "bean" must be followed by either attribute specifications, ">" or "/>".
在这里其他内容就省了,错在,<bean id="bpcsU1gblDAO"class="dao.jk.bpcs.impl.BpcsU1gblDaoImpl" ...
jquery获得select的文本
本来以为jQuery("#select1").val();是取得选中的值, 那么jQuery("#select1").text();就是取得的文本. 这是不正确 ...
命令行调试smali
am start -D -n com.agilebits.onepassword/.activity.LoginActivity ps | grep passu0_a126 1107 217 5107 ...
IE6下绝对定位元素和浮动元素并列绝对定位元素消失
<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <title> ...
我终于有案例库啦（github 提供的）
穷逼一个,一直在纠结要不要买个服务器什么的. 后来在慕课网看 git 教程时看到 github 可以帮你展示网页哟,于是我便有了这个案例库. 网址:https://foreverz133.github ...
mysql笔记6之数据类型
1 区别一: varchar:可变长度的字符串.根据添加的数据长度决定占用的字符数 char:固定长度的字符串 2区别二 int:没有限制 int(4):限制为4 3 区别三: 日期: date ...

Part of Speech Tagging

Part of Speech Tagging的更多相关文章

随机推荐

热门专题