python调用NLPIR - ICTCLAS2013实现中文分词

环境：win7、VS2008、Python2.7.3

第一步：照着文档[2]将NLPIR库封装成Python的扩展；

第二步：新建一个名为“nlpir_demo”的目录，将第一步最后得到的名为“nlpirpy_ext”的文件夹拷贝到“.../nlpir_demo/”目录下；

第三步：在文档[2]尾部提供的“seg.py”基础上，在“.../nlpir_demo/nlpirpy_ext/”目录下，新建一个名为“C_NLPIR_ICTCLAS2013.py”的文件，内容如下，目的是将NLPIR进一步封装成一个Python类；

 #-*- encoding: utf-8 -*-

 import NLPIR

 import os

 class C_NLPIR_ICTCLAS2013:

     def __init__(self,s_code='GBK'):

         dataurl = os.path.join(os.path.dirname(__file__))

         isinit = 0

         if s_code == 'GBK':

             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_CODE)

         elif s_code == 'UTF-8':

             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.UTF8_CODE)

         elif s_code == 'BIG5':

             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.BIG5_CODE)

         elif s_code == 'GBK_FANTI':

             isinit = NLPIR.NLPIR_Init(dataurl,NLPIR.GBK_FANTI_CODE)

         if isinit:

             print 'NLPIR 初始化成功'

         else:

             print 'NLPIR 初始化失败'

     def stringSeg(self, s_string, i_bPOStagged=0):

         """

         Function: Process one string;

         Parameters: @s_string - The string to be analyed,

                     @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0.

         Return Value: the pointer of result buffer.

         """

         return NLPIR.NLPIR_ParagraphProcess(s_string, i_bPOStagged)

     def fileSeg(self,s_sourceFile,s_targetFile, i_bPOStagged=0):

         """

         Function: Process one text file and save the result into one file;

         Parameters: @s_sourceFile -  The source file name to be analysized,

                     @s_targetFile - The result file name to store the results.

                     @i_bPOStagged: Judge whether need POS tagging, 0 for no tag; 1 for tagging; default:0.

         Return Value: the processing speed if processing succeed. Otherwise return false.

         """

         return NLPIR.NLPIR_FileProcess(s_sourceFile, s_targetFile, i_bPOStagged)

     def importUserDict(self,s_userDictFile):

         """

         Functin: Import user-defined dictionary from a text file;

         Parameters: @s_userDictFile - the filename saved user dictionary text;

         Return Value: The number of lexical entry imported successfully

         ???: What's the writting style of the userDicFile ?

         """

         return NLPIR.NLPIR_ImportUserDict(s_userDictFile)

     def addUserWord(self,s_word):

                 '''

                 Function: Add a word to the user dictionary;

                 Parameters: @s_Word - the word added.

                 Return Value: 1 if add succeed. Otherwise return 0.

                 '''

         return NLPIR.NLPIR_AddUserWord(s_word)

     def saveTheUserDict(self):

                 '''

                 Function: Save the user dictionary to disk.

                 Parameters: none;

                 Return Value:  1 if save succeed,otherwise return 0.

                 ???: Where's the file_direction of "disk" ?

                 '''

         return NLPIR.NLPIR_SaveTheUsrDic()

     def delUserWord(self,s_word):

                 '''

                 Function: Delete a word from the  user dictionary;

                 Parameters: @s_word - the word to be deleted;

                 Return Value: -1 if the word not exist in the user dictionary, otherwise the handle of the word deleted.

                 '''

         return NLPIR.NLPIR_DelUsrWord(s_word)

     def exit(self):

                 '''

                 Return value: true if succeed, otherwise false.

                 '''

                 return NLPIR.NLPIR_Exit()

 if __name__ == '__main__':

     O_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8')

     raw_input('\n~!')

第四步：在“.../nlpir_demo/”目录下，新建一个名为“NLPIR_demo.py”的文件，内容如下，试着调用“.../nlpir_demo/nlpirpy_ext/C_NLPIR_ICTCLAS2013.py”中定义的类C_NLPIR_ICTCLAS2013；

 #-*-encoding:utf-8-*-

 from nlpirpy_ext.C_NLPIR_ICTCLAS2013 import C_NLPIR_ICTCLAS2013

 if __name__ == '__main__':

     o_C_NLPIR_ICTCLAS2013 = C_NLPIR_ICTCLAS2013('UTF-8')

     raw_input('\n~!')

     s_test = '1989年春夏之交的政治风波1989年政治风波24小时降雪量24小时降雨量863计划ABC防护训练APEC会议BB机BP机C2系统C3I系统C3系统C4ISR系统C4I系统CCITT建议'

     result = o_C_NLPIR_ICTCLAS2013.stringSeg(s_test)

     raw_input(result)

第五步：执行文件“.../nlpir_demo/NLPIR_demo.py”，即可~！

说明：关于文档[2]中提到的SWIG，可见文档[1]提供了另外两篇文档~！

参考文档：

[1]Python、Ruby中的SWIG使用案例, http://www.cnblogs.com/chanyin/p/3340780.html

[2]NLPIR(ICTCLAS2013) Python版, http://www.nilday.com/nlpirictclas2013-python%E7%89%88/

python调用NLPIR - ICTCLAS2013实现中文分词的更多相关文章

Python环境下NIPIR(ICTCLAS2014)中文分词系统使用攻略
一.安装官方链接:http://pynlpir.readthedocs.org/en/latest/installation.html 官方网页中介绍了几种安装方法,大家根据个人需要,自行参考!我采 ...
python第三方库------jieba库(中文分词)
jieba“结巴”中文分词:做最好的 Python 中文分词组件 github:https://github.com/fxsjy/jieba 特点支持三种分词模式: 精确模式,试图将句子最精确地切开, ...
Python第三方库jieba（中文分词）入门与进阶（官方文档）
jieba "结巴"中文分词:做最好的 Python 中文分词组件 github:https://github.com/fxsjy/jieba 特点支持三种分词模式: 精确模式, ...
Python学习实践------正向最大匹配中文分词
正向最大匹配分词: 1.加载词典文件到集合中,取词典文件中最大长度词的length 2.每次先在句子中按最大长度分割,然后判断分割的词是否存在字典中,存在则记录此词,调整起始点. 3.不存在则按最大长 ...
Python大数据：jieba 中文分词，词频统计
# -*- coding: UTF-8 -*- import sys import numpy as np import pandas as pd import jieba import jieba. ...
NLPIR（北理工张华平版中文分词系统）的SDK(C++)调用方法
一.本文内容简介二.具体内容 1. 中文分词的基本概念 2.关于NLPIR(北理工张华平版中文分词系统)的基本情况 3.具体SDK模块(C++)的组装方式 ①准备内容: ②开始组装三.注意事项一 ...
中文分词工具简介与安装教程（jieba、nlpir、hanlp、pkuseg、foolnltk、snownlp、thulac）
2.1 jieba 2.1.1 jieba简介 Jieba中文含义结巴,jieba库是目前做的最好的python分词组件.首先它的安装十分便捷,只需要使用pip安装:其次,它不需要另外下载其它的数据包 ...
中文分词工具探析（一）：ICTCLAS (NLPIR)
1. 前言 ICTCLAS是张华平在2000年推出的中文分词系统,于2009年更名为NLPIR.ICTCLAS是中文分词界元老级工具了,作者开放出了free版本的源代码(1.0整理版本在此). 作者在 ...
Sphinx中文分词安装配置及API调用
这几天项目中需要重新做一个关于商品的全文搜索功能,于是想到了用Sphinx,因为需要中文分词,所以选择了Sphinx for chinese,当然你也可以选择coreseek,建议这两个中选择一个,暂 ...

随机推荐

DynamicsCRM中的自动保存
DynamicsCRM的自动保存功能在DynamicsCRM2013开始,引入了自动保存功能. 保存一条记录在新建一条记录的时候, 你必须在左上角手动点击保存按钮.如下图: 当保存完后,会发现,左 ...
oracle 中数据库完全导入导出：cmd命令行模式(转载)
http://www.3lian.com/edu/2012/12-01/47252.html Oracle数据导入导出imp/exp就相当于oracle数据还原与备份.exp命令可以把数据从远程数据库 ...
abap调vb写的dll实现电子天平的读数（带控件版）
废话不多说,直接上. 鉴于abap调研的dll文件需要在wins注册,自己尝试过delphi和C#感觉不是很好,最后毅然选择了VB来写因为需要用到MScomm控件,所以对于将要写的dll需要带for ...
css中左侧固定,右侧自适应
谈谈我开始出来工作时候的一道面试题吧当初我记得在太平洋网络面试的时候,面试官给我出了这么一道题: 有一个外层的div 中间有左右两个div 要求左侧的div 1.只告诉你宽度; 2.只告 ...
html/css 钢琴黑白格布局
效果图:
Spring MVC常用注解
cp by http://www.cnblogs.com/leskang/p/5445698.html 1.@Controller 在SpringMVC 中,控制器Controller 负责处理由Di ...
thinkjs中自定义sql语句
一直以为在使用thinkjs时,只能是它自带的sql语句查询,当遇到类似于这样的sql语句时,却不知道这该怎样来写程序,殊不知原来thinkjs可以执行自定义sql语句 SELECT * from a ...
程设大作业xjb写——魔方复原
鸽了那么久总算期中过[爆]去[炸]了...该是时候写写大作业了 [总不能丢给他们不会写的来做吧一.三阶魔方的几个基本定义 ↑就像这样,可以定义面的称呼:上U下D左L右R前F后B UD之间的叫E,LR ...
android xfermode绘图
1.fermode:是来自 transfer-modes,在画笔中设置xfermode后,画笔在使用的时候就会应用设置的像素转换模式.就类似于ps中的选取功能,留下的选取是从原选区中去除还是合并,取交 ...
Java 设计模式之代理模式
1. 定义:为其它对象提供一种代理以控制对这个对象的访问.在某些情况下,一个对象不适合或者不能直接引用另一个对象,而代理对象可以在客户端和目标对象之间起到中介的作用. 2. 类图:代理对象和被代理 ...

python调用NLPIR - ICTCLAS2013实现中文分词

python调用NLPIR - ICTCLAS2013实现中文分词的更多相关文章

随机推荐

热门专题