Configure the Stanford segmenter for NLTK

>>> from nltk.tokenize.stanford_segmenter import StanfordSegmenter

>>> segmenter = StanfordSegmenter(path_to_jar='stanford-segmenter-3.8.0.jar', path_to_sihan_corpora_dict='./data', path_to_model='./data/pku.gz', path_to_dict='./data/dict-chris6.ser.gz')

>>> sentence = u'这是斯坦福中文分词器测试'

>>> segmenter.segment(sentence)

u'\u8fd9 \u662f \u65af\u5766\u798f \u4e2d\u6587 \u5206\u8bcd\u5668 \u6d4b\u8bd5\n'

>>> segmenter.segment_file('test.simp.utf8')

u'\u9762\u5bf9 \u65b0 \u4e16\u7eaa \uff0c \u4e16\u754c \u5404\u56fd \u4eba\u6c11 \u7684 \u5171\u540c \u613f\u671b \u662f \uff1a \u7ee7\u7eed \u53d1\u5c55 \u4eba\u7c7b \u4ee5\u5f80 \u521b\u9020 \u7684 \u4e00\u5207 \u6587\u660e \u6210\u679c \uff0c \u514b\u670d 20 \u4e16\u7eaa \u56f0\u6270 \u7740 \u4eba\u7c7b \u7684 \u6218\u4e89 \u548c \u8d2b\u56f0 \u95ee\u9898 \uff0c \u63a8\u8fdb \u548c\u5e73 \u4e0e \u53d1\u5c55 \u7684 \u5d07\u9ad8 \u4e8b\u4e1a \uff0c \u521b\u9020 \u4e00\u4e2a \u7f8e\u597d \u7684 \u4e16\u754c \u3002\n'

>>> outfile = open('outfile', 'w')

>>> result = segmenter.segment(sentence)

>>> outfile.write(result.encode('UTF-8'))

>>> outfile.close()

Configure the Stanford segmenter for NLTK的更多相关文章

在 NLTK 中使用 Stanford NLP 工具包
转载自:http://www.zmonster.me/2016/06/08/use-stanford-nlp-package-in-nltk.html 目录 NLTK 与 Stanford NLP 安 ...
NLTK和Stanford NLP两个工具的安装配置
这里安装的是两个自然语言处理工具,NLTK和Stanford NLP. 声明:笔者操作系统是Windows10,理论上Windows都可以: 版本号:NLTK 3.2 Stanford NLP 3.6 ...
[转]NLP Tasks
Natural Language Processing Tasks and Selected References I've been working on several natural langu ...
国产深度学习框架mindspore-1.3.0 gpu版本无法进行源码编译
官网地址: https://www.mindspore.cn/install 所有依赖环境进行sudo make install 安装,最终报错: 错误记录信息: cat /tmp/mind ...
【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理
干货!详述Python NLTK下如何使用stanford NLP工具包作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...
[转]【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理阅读目录
[NLP]干货!Python NLTK结合stanford NLP工具包进行文本处理原贴: https://www.cnblogs.com/baiboy/p/nltk1.html 阅读目录目 ...
Stanford Word Segmenter使用
1,下载 Stanford Word Segmenter软件包: Download Stanford Word Segmenter version 2014-06-16 2,在eclipse上建立一个 ...
Stanford Word Segmenter的特定领域训练
有没有人自己训练过Stanford Word Segmenter分词器,因为我想做特定领域的分词,但在使用Stanford Word Segmenter分词的时候发现对于我想做的领域的一些词分词效果并 ...
【NLP】Python NLTK处理原始文本
Python NLTK 处理原始文本作者:白宁超 2016年11月8日22:45:44 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的一种自然语言工具包,其收集的大量公开 ...

随机推荐

mysql数据库字符集相关操作（修改表字段编码，使其支持emoji表情）
普通的UTF8编码是不支持emoji表情插入的,会报异常: Caused by: java.sql.SQLException: Incorrect string value: '\xF0\x9F\x9 ...
《More Accurate Question Answering on Freebase》文献笔记
bast-2015-CIKM CIKM全称是International Conference on Information and Knowledge Management 这篇文章主要采用采用lea ...
解决bootstrap 模态框数据清除验证清空
$("#switchModel").on("hidden.bs.modal", function () { $('#ware-form')[0].reset() ...
angularjs 的ng-disabled属性操作
ng-readonly:不可用,但是可以提交数据 ng-disabled: 属性是控制标签是否可用(不可用且无法传值) 表达式控制: <input class="col-md-2 fo ...
Java UTC时间与本地时间互相转换
协调世界时,又称世界统一时间.世界标准时间.国际协调时间.由于英文(CUT)和法文(TUC)的缩写不同,作为妥协,简称UTC. 这套时间系统被应用于许多互联网和万维网的标准中,例如,网络时间协议就是协 ...
HDU - 6440 Dream 2018中国大学生程序设计竞赛 - 网络选拔赛
给定的$p$是素数,要求给定一个加法运算表和乘法运算表,使$(m+n)^p = m^p +n^p(0 \leq m,n < p)$. 因为给定的p是素数,根据费马小定理得 \((m+n) ...
POJ 2154 color (polya + 欧拉优化)
Beads of N colors are connected together into a circular necklace of N beads (N<=1000000000). You ...
/* * 有五个学生，每个学生有3门课的成绩，从键盘输入以上数据 *（包括学生号，姓名，三门课成绩），计算出平均成绩， *将原有的数据和计算出的平均分数存放在磁盘文件"stud"中。 */
1.Student类:类中有五个变量,分别是学号,姓名,三门成绩 package test3; public class Student { private int num; private Stri ...
Spring Data Solr入门
如何将Solr的应用集成到Spring中? SpringDataSolr就是为了方便Solr的开发所研制的一个框架,其底层是对SolrJ的封装. SpringDataSolr入门小Demo 首先目录结 ...
JavaScript 示例
JavaScript 示例 <html lang="en"> <head> <meta charset="UTF-8"> & ...

Configure the Stanford segmenter for NLTK

Configure the Stanford segmenter for NLTK的更多相关文章

随机推荐

热门专题