QQ：231469242

欢迎nltk爱好者交流

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

命名实体（Named Entity）类别识别

This is a temporary script file.

"""

import nltk

from nltk.corpus import state_union

from nltk.tokenize import PunktSentenceTokenizer

sentence="Bush is a pig in WhiteHouse in America."

words=nltk.word_tokenize(sentence)

tagged=nltk.pos_tag(words)

nameEnt=nltk.ne_chunk(tagged,binary=False)

nameEnt.draw()

This is a temporary script file.

"""

import nltk

from nltk.corpus import state_union

from nltk.tokenize import PunktSentenceTokenizer

train_text=state_union.raw("2005-GWBush.txt")

sample_text=state_union.raw("2006-GWBush.txt")

custom_sent_tokenizer=PunktSentenceTokenizer(train_text)

#分句

tokenized=custom_sent_tokenizer.tokenize(sample_text)

for i in tokenized[0:5]:

    words=nltk.word_tokenize(i)

    tagged=nltk.pos_tag(words)

    nameEnt=nltk.ne_chunk(tagged,binary=False)

    #print(nameEnt)

    nameEnt.draw()

nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.

This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.

Here's an example:

import nltk

from nltk.corpus import state_union

from nltk.tokenize import PunktSentenceTokenizer

train_text = state_union.raw("2005-GWBush.txt")

sample_text = state_union.raw("2006-GWBush.txt")

custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():

    try:

        for i in tokenized[5:]:

            words = nltk.word_tokenize(i)

            tagged = nltk.pos_tag(words)

            namedEnt = nltk.ne_chunk(tagged, binary=True)

            namedEnt.draw()

    except Exception as e:

        print(str(e))

process_content()

Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.

Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:

NE Type and Examples
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian

Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.

In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.

python风控评分卡建模和风控常识

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

自然语言18.1_Named Entity Recognition with NLTK的更多相关文章

自然语言18.2_NLTK命名实体识别
QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...
自然语言12_Tokenizing Words and Sentences with NLTK
https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...
自然语言处理NLP程序包（NLTK/spaCy）使用总结
NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...
自然语言27_Converting words to Features with NLTK
https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...
自然语言15_Part of Speech Tagging with NLTK
https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...
【448】NLP, NER, PoS
目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...
自然语言18_Named-entity recognition
https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...
自然语言17_Chinking with NLTK
https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...
【NLP】干货！Python NLTK结合stanford NLP工具包进行文本处理
干货!详述Python NLTK下如何使用stanford NLP工具包作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...

随机推荐

.net MVC全球化资源使用心得
网上有的我就不说了,我只记录下我碰壁的事情. local资源就不说,这里只说global全局资源文件. 假设新建一个资源文件名称叫做resourceA, 下面几点记录备忘: resouceA就是Get ...
Node.js项目目录介绍
新建的项目结构应该是这样 bin:项目的启动文件,也可以放其他脚本. node_modules:用来存放项目的依赖库. public:用来存放静态文件(css,js,img). routes:路由控制 ...
extjs 箱子布局
a.flex 配置项 flex 配置项不是设置在布局上,而是设置在子项的配置项.每个子项相对的 flex 值都会与全体子项 flex 累加的值相比较,根据此结果,处理每个子项的 flex 最后是多少. ...
“Ceph浅析”系列之二——Ceph概况
本文将对Ceph的基本情况进行概要介绍,以期读者能够在不涉及技术细节的情况下对Ceph建立一个初步印象. 1. 什么是Ceph? Ceph的官方网站Ceph.com上用如下这句话简明扼要地定义了Cep ...
Linux基础知识集锦
查看当前进程ID与当前进程的父进程ID $$ echo $PPID shell脚本之for循环 for ((i=0;i<10;++i)) do echo "hello",$i ...
bzoj1503
treap改了好长时间,erase写错了... #include<iostream> #include<cstdio> #include<cstdlib> usin ...
iOS 蓝牙开发（三）app作为外设被连接的实现(转)
转载自:www.cocoachina.com/ios/20151105/14071.html 原作者:刘彦玮再上一节说了app作为central连接peripheral的情况,这一节介绍如何使用ap ...
iOS正则表达式
//包含数字和字母的密码长度6-16位 -(BOOL) validatePassword:(NSString *)password { //密码正则表达式 NSString *passwordRege ...
精通Web Analytics 2.0 （3）第一章：网站分析的新奇世界
精通Web Analytics 2.0 : 用户中心科学与在线统计艺术第一章:Web Analytics 2.0的新奇世界多年以来,我们很清楚的知道,网站分析能够真正的改革网络上业务的完成方式.那 ...
58. Android一些开发习惯总结
作者:漫步链接:https://www.zhihu.com/question/27227425/answer/35973793 来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请 ...

自然语言18.1_Named Entity Recognition with NLTK

sklearn实战-乳腺癌细胞数据挖掘（博主亲自录制视频教程）