Configure the Stanford segmenter for NLTK

>>> from nltk.tokenize.stanford_segmenter import StanfordSegmenter

>>> segmenter = StanfordSegmenter(path_to_jar='stanford-segmenter-3.8.0.jar', path_to_sihan_corpora_dict='./data', path_to_model='./data/pku.gz', path_to_dict='./data/dict-chris6.ser.gz')

>>> sentence = u'这是斯坦福中文分词器测试'

>>> segmenter.segment(sentence)

u'\u8fd9 \u662f \u65af\u5766\u798f \u4e2d\u6587 \u5206\u8bcd\u5668 \u6d4b\u8bd5\n'

>>> segmenter.segment_file('test.simp.utf8')

u'\u9762\u5bf9 \u65b0 \u4e16\u7eaa \uff0c \u4e16\u754c \u5404\u56fd \u4eba\u6c11 \u7684 \u5171\u540c \u613f\u671b \u662f \uff1a \u7ee7\u7eed \u53d1\u5c55 \u4eba\u7c7b \u4ee5\u5f80 \u521b\u9020 \u7684 \u4e00\u5207 \u6587\u660e \u6210\u679c \uff0c \u514b\u670d 20 \u4e16\u7eaa \u56f0\u6270 \u7740 \u4eba\u7c7b \u7684 \u6218\u4e89 \u548c \u8d2b\u56f0 \u95ee\u9898 \uff0c \u63a8\u8fdb \u548c\u5e73 \u4e0e \u53d1\u5c55 \u7684 \u5d07\u9ad8 \u4e8b\u4e1a \uff0c \u521b\u9020 \u4e00\u4e2a \u7f8e\u597d \u7684 \u4e16\u754c \u3002\n'

>>> outfile = open('outfile', 'w')

>>> result = segmenter.segment(sentence)

>>> outfile.write(result.encode('UTF-8'))

>>> outfile.close()

Configure the Stanford segmenter for NLTK的更多相关文章

在 NLTK 中使用 Stanford NLP 工具包
转载自:http://www.zmonster.me/2016/06/08/use-stanford-nlp-package-in-nltk.html 目录 NLTK 与 Stanford NLP 安 ...
NLTK和Stanford NLP两个工具的安装配置
这里安装的是两个自然语言处理工具,NLTK和Stanford NLP. 声明:笔者操作系统是Windows10,理论上Windows都可以: 版本号:NLTK 3.2 Stanford NLP 3.6 ...
[转]NLP Tasks
Natural Language Processing Tasks and Selected References I've been working on several natural langu ...
国产深度学习框架mindspore-1.3.0 gpu版本无法进行源码编译
官网地址: https://www.mindspore.cn/install 所有依赖环境进行sudo make install 安装,最终报错: 错误记录信息: cat /tmp/mind ...
【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理
干货!详述Python NLTK下如何使用stanford NLP工具包作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...
[转]【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理阅读目录
[NLP]干货!Python NLTK结合stanford NLP工具包进行文本处理原贴: https://www.cnblogs.com/baiboy/p/nltk1.html 阅读目录目 ...
Stanford Word Segmenter使用
1,下载 Stanford Word Segmenter软件包: Download Stanford Word Segmenter version 2014-06-16 2,在eclipse上建立一个 ...
Stanford Word Segmenter的特定领域训练
有没有人自己训练过Stanford Word Segmenter分词器,因为我想做特定领域的分词,但在使用Stanford Word Segmenter分词的时候发现对于我想做的领域的一些词分词效果并 ...
【NLP】Python NLTK处理原始文本
Python NLTK 处理原始文本作者:白宁超 2016年11月8日22:45:44 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的一种自然语言工具包,其收集的大量公开 ...

随机推荐

Python openpyxl、pandas操作Excel方法简介与具体实例
本篇重点讲解windows系统下 Python3.5中第三方excel操作库-openpyxl: 其实Python第三方库有很多可以操作Excel,如:xlrd,xlwt,xlwings甚至注明的数据 ...
Python：遍历一个目录下所有的文件及文件夹，然后计算每个文件的字符和line的小程序
编写了一个遍历一个目录下所有的文件及文件夹,然后计算每个文件的字符和line的小程序,先把程序贴出来. #coding=utf-8 ''' Created on 2014年7月14日 @author: ...
C#设计模式(6)——原型模式（Prototype Pattern）（转）
一.引言在软件系统中,当创建一个类的实例的过程很昂贵或很复杂,并且我们需要创建多个这样类的实例时,如果我们用new操作符去创建这样的类实例,这未免会增加创建类的复杂度和耗费更多的内存空间,因为这样在 ...
appium启动
from appium import webdriver from time import sleep capabilitise = { "platformName": " ...
前端学习历程--js--原型&闭包
一.数据类型 1.值类型:undefined, number, string, boolean,不是对象 2.引用类型:函数.数组.对象.null.new Number(10)都是对象 3.引用类型判 ...
fcrackzip （zip密码破解工具）
现在做一个例子,首先生成一个带有密码的zip的包 zip -P hujhh test.zip test1.txt test2,txt 可以看到密码是5位的纯字母现在就用我们的这个软件开始破解 fcr ...
Bubble Sort (找规律)
通过模拟之后我们发现对于每一个位置上的数他都有一个规律,那就是先左移然后在右移.然后仔细发现可以知道,先右移的距离是前面比该数大的个数.右移就直接右移到目标位置了.然后用一个树状数组从左到右边扫边加就 ...
(Review cs231n) Optimized Methods
Mini-batch SGD的步骤: 1.Sample a batch of data 2.Forward prop it through the graph,get loss 3.backprop ...
sublime text 入门
sublime text3入门教程 2017年07月19日 09:15:51 阅读数:13736 作者:sam976 转载需征得作者本人同意,谢谢. 1.介绍所谓工欲善其事必先利其器,编码过程合理熟 ...
mysql插入数据报错1366
数据表插入中文数据报错 Warning Code : 1366 Incorrect string value: '\xE5\x9C\xA8' for column 'name' at row 1 原因 ...

Configure the Stanford segmenter for NLTK

Configure the Stanford segmenter for NLTK的更多相关文章

随机推荐

热门专题