python+NLTK 自然语言学习处理：环境搭建

首先在http://nltk.org/install.html去下载相关的程序。需要用到的有python,numpy,pandas, matplotlib. 当安装好所有的程序之后运行nltk.download()进行词料库的下载。如下图。选择All packages。然后点击下载

这里需要注意的是Download Directory 可以自行修改。但是最后的一级目录必须是nltk_data

比如可以修改成D:\nltk_data

这个下载器下载很慢，经常会遇到下载不了的时候。这个时候有两种方法可以选择：

1 直接到 http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml 去下载对应的包

2第二种方法：网上也有其他人打包的库：比如下面的这个链接就可以下载

https://d11.baidupcs.com/file/b8adca61f3d951733a1508c538fb139f?bkt=p3-1400b8adca61f3d951733a1508c538fb139f7a5a378700001237cfb6&xcode=24ee57e4c00df669f8114f90862e7a576f1a5fd0dfa92cd70b2977702d3e6764&fid=655353904-250528-168229026483879&time=1498354932&sign=FDTAXGERLBHS-DCb740ccc5511e5e8fedcff06b081203-farXKS5Ut9qIEKMP6uCJBn0sFLk%3D&to=d11&size=305647542&sta_dx=305647542&sta_cs=1637&sta_ft=zip&sta_ct=7&sta_mt=7&fm2=MH,Ningbo,Netizen-anywhere,,sichuan,ct&newver=1&newfm=1&secfm=1&flow_ver=3&pkey=1400b8adca61f3d951733a1508c538fb139f7a5a378700001237cfb6&sl=83034191&expires=8h&rt=sh&r=640794177&mlogid=4068121183592230425&vuk=1681792858&vbdid=634719214&fin=nltk_data.zip&fn=nltk_data.zip&rtype=1&iv=0&dp-logid=4068121183592230425&dp-callid=0.1.1&hps=1&csl=300&csign=YEkhhUZEK82GGRxxvymOo9t9Y2E%3D&by=themis

这里需要注意的是自行下载的包必须要放在nltk_data文件夹里面。否则导入的时候会出现失败：比如我下载到NLTK的文件夹里面，在导入的时候报如下错误。系统

>>> from nltk.book import *

*** Introductory Examples for the NLTK Book ***

Loading text1, ..., text9 and sent1, ..., sent9

Type the name of the text or sentence to view it.

Type: 'texts()' or 'sents()' to list the materials.

Traceback (most recent call last):

File "<pyshell#0>", line 1, in <module>

from nltk.book import *

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\book.py", line 20, in <module>

text1 = Text(gutenberg.words('melville-moby_dick.txt'))

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 116, in __getattr__

self.__load()

File "E:\python2.7.11\lib\site-packages\nltk-3.2.4-py2.7.egg\nltk\corpus\util.py", line 81, in __load

except LookupError: raise e

LookupError:

**********************************************************************

Resource u'corpora/gutenberg' not found. Please use the NLTK

Downloader to obtain the resource: >>> nltk.download()

Searched in:

- 'C:\\Users\\Administrator/nltk_data'

- 'C:\\nltk_data'

- 'D:\\nltk_data'

- 'E:\\nltk_data'

- 'E:\\python2.7.11\\nltk_data'

- 'E:\\python2.7.11\\lib\\nltk_data'

- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'

系统在下面的几个路径去找，由于没有nltk_data的文件夹，所以找不到相关的文件

- 'C:\\Users\\Administrator/nltk_data'

- 'C:\\nltk_data'

- 'D:\\nltk_data'

- 'E:\\nltk_data'

- 'E:\\python2.7.11\\nltk_data'

- 'E:\\python2.7.11\\lib\\nltk_data'

- 'C:\\Users\\Administrator\\AppData\\Roaming\\nltk_data'

将文件目录名改成如下后就可以了

我们再导入就成功了

>>> from nltk.book import *

*** Introductory Examples for the NLTK Book ***

Loading text1, ..., text9 and sent1, ..., sent9

Type the name of the text or sentence to view it.

Type: 'texts()' or 'sents()' to list the materials.

text1: Moby Dick by Herman Melville 1851

text2: Sense and Sensibility by Jane Austen 1811

text3: The Book of Genesis

text4: Inaugural Address Corpus

text5: Chat Corpus

text6: Monty Python and the Holy Grail

text7: Wall Street Journal

text8: Personals Corpus

text9: The Man Who Was Thursday by G . K . Chesterton 1908

我们来测试一把：下面这个命令的意义在于在text1文本中查找monstrous出现的地方

>>> text1.concordance('monstrous')

Displaying 11 of 11 matches:

ong the former , one was of a most monstrous size . ... This came towards us ,

ON OF THE PSALMS . " Touching that monstrous bulk of the whale or ork we have r

ll over with a heathenish array of monstrous clubs and spears . Some were thick

d as you gazed , and wondered what monstrous cannibal and savage could ever hav

that has survived the flood ; most monstrous and most mountainous ! That Himmal

they might scout at Moby Dick as a monstrous fable , or still worse and more de

th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l

ing Scenes . In connexion with the monstrous pictures of whales , I am strongly

ere to enter upon those still more monstrous stories of them which are to be fo

ght have been rummaged out of this monstrous cabinet there is no telling . But

of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

环境已经搭建好了，后面就开始正式的NLTK学习了

python+NLTK 自然语言学习处理：环境搭建的更多相关文章

python+NLTK 自然语言学习处理二：文本
在前面讲nltk安装的时候,我们下载了很多的文本.总共有9个文本.那么如何找到这些文本呢: text1: Moby Dick by Herman Melville 1851 text2: Sense ...
python+NLTK 自然语言学习处理六：分类和标注词汇一
在一段句子中是由各种词汇组成的.有名词,动词,形容词和副词.要理解这些句子,首先就需要将这些词类识别出来.将词汇按它们的词性(parts-of-speech,POS)分类并相应地对它们进行标注.这个过 ...
python+NLTK 自然语言学习处理八：分类文本一
从这一章开始将进入到关键部分:模式识别.这一章主要解决下面几个问题 1 怎样才能识别出语言数据中明显用于分类的特性 2 怎样才能构建用于自动执行语言处理任务的语言模型 3 从这些模型中我们可以学到那些 ...
python+NLTK 自然语言学习处理七：N-gram标注
在上一章中介绍了用pos_tag进行词性标注.这一章将要介绍专门的标注器. 首先来看一元标注器,一元标注器利用一种简单的统计算法,对每个标识符分配最有可能的标记,建立一元标注器的技术称为训练. fro ...
python+NLTK 自然语言学习处理五：词典资源
前面介绍了很多NLTK中携带的词典资源,这些词典资源对于我们处理文本是有大的作用的,比如实现这样一个功能,寻找由egivronl几个字母组成的单词.且组成的单词每个字母的次数不得超过egivronl中 ...
python+NLTK 自然语言学习处理四：获取文本语料和词汇资源
在前面我们通过from nltk.book import *的方式获取了一些预定义的文本.本章将讨论各种文本语料库 1 古腾堡语料库古腾堡是一个大型的电子图书在线网站,网址是http://www.g ...
python+NLTK 自然语言学习处理三：如何在nltk/matplotlib中的图片中显示中文
我们首先来加载我们自己的文本文件,并统计出排名前20的字符频率 if __name__=="__main__": corpus_root='/home/zhf/word' word ...
Python+NLTK自然语言处理学习（一）：环境搭建
Python+NLTK自然语言处理学习(一):环境搭建参考黄聪的博客地址:http://www.cnblogs.com/huangcong/archive/2011/08/29/2157437.ht ...
Python基础学习之环境搭建
Python如今成为零基础编程爱好者的首选学习语言,这和Python语言自身的强大功能和简单易学是分不开的.今天我们将带领Python零基础的初学者完成入门的第一步——环境搭建.本文会先来区分几个在P ...

随机推荐

Python实现RNN
一般的前馈神经网络中, 输出的结果只与当前输入有关与历史状态无关, 而递归神经网络(Recurrent Neural Network, RNN)神经元的历史输出参与下一次预测. 本文中我们将尝试使用R ...
anjular中Service、Factory、Provider的使用与js中创建对象的总结
在学习anjular中Service的使用时,发现和js中的创建对象的方式有一定的联系,所以总结了anjular中Service.Factory.Provider的使用方式与js创建对象的方式一.先 ...
Vue中comoputed中的数据绑定
Vue中的数据实现响应式绑定是在初始化的时候利用definePrototype的定义set和get过滤器,在进行组件模板编译时实现water的监听搜集依赖项,当数据发生变化时在set中通过调用dep. ...
UGUI ScrollRect 性能优化
测试环境操作系统:Windows8.1 开发工具:Unity5.5.2 1.问题描述,在实际开发过程中经常会使用ScrollRect实现滚动列表,当初次加载数据比较多的情形时,Unity3D会出现比 ...
关于Python编码，超诡异的，我也是醉了
Python的编码问题,真是让人醉了.最近碰到的问题还真不少.比如中文文件名.csv .python对外呈现不一致啊,感觉好不公平. 没图说个JB,下面立马上图. 我早些时候的其他脚本,csv都是 ...
11.Java 加解密技术系列之总结
Java 加解密技术系列之总结序背景分类常用算法原理关于代码结束语序上一篇文章中简单的介绍了第二种非对称加密算法 — — DH,这种算法也经常被叫做密钥交换协议,它主要是针对密钥的 ...
CMT2300 收发一体 SUB 1G 支持灵活选频
CMT2300A 是一款超低功耗,高性能,适用于各种140 至1020 MHz 无线应用的OOK,(G)FSK 射频收发器.它是CMOSTEK NextGenRFTM 射频产品线的一部分,这条产品线包 ...
虚拟机kali找不到无线网卡、搜不到无线网络
VitualBox虚拟机下刚装好kali系统后,使用无线网卡,在主机上插一块usb无线网卡,然后进入虚拟机系统会发现无线网卡刚开始还能扫描出周围的无线网路, 过一会就搜不到了,显示无网络,输入命令iw ...
python爬虫从入门到放弃（七）之 PyQuery库的使用
PyQuery库也是一个非常强大又灵活的网页解析库,如果你有前端开发经验的,都应该接触过jQuery,那么PyQuery就是你非常绝佳的选择,PyQuery 是 Python 仿照 jQuery 的严 ...
Eclipse用法：自动生成get和set方法
方法一 Java的类中,除了常量声明为静态且公有的,一般的对象数据作用域,都是声明为私有的.这样做能保护对象的属性不会被随意改变,调试的时候也会方便很多:在类的公有方法中大一个调用栈就能看到哪里改 ...

python+NLTK 自然语言学习处理：环境搭建

python+NLTK 自然语言学习处理：环境搭建的更多相关文章

随机推荐

热门专题