自然语言18.1_Named Entity Recognition with NLTK
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

QQ:231469242
欢迎nltk爱好者交流
https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/
Named Entity Recognition with NLTK
命名实体(Named Entity)类别识别
This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer sentence="Bush is a pig in WhiteHouse in America."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False) nameEnt.draw()

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text=state_union.raw("2005-GWBush.txt")
sample_text=state_union.raw("2006-GWBush.txt") custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#分句
tokenized=custom_sent_tokenizer.tokenize(sample_text) for i in tokenized[0:5]:
words=nltk.word_tokenize(i)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False)
#print(nameEnt)
nameEnt.draw()
nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.
This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.
Here's an example:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt") custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e)) process_content()
Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.
Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian
Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.
In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.
自然语言18.1_Named Entity Recognition with NLTK的更多相关文章
- 自然语言18.2_NLTK命名实体识别
QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...
- 自然语言12_Tokenizing Words and Sentences with NLTK
https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...
- 自然语言处理NLP程序包(NLTK/spaCy)使用总结
NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...
- 自然语言27_Converting words to Features with NLTK
https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...
- 自然语言15_Part of Speech Tagging with NLTK
https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...
- 【448】NLP, NER, PoS
目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...
- 自然语言18_Named-entity recognition
https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...
- 自然语言17_Chinking with NLTK
https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...
- 【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理
干货!详述Python NLTK下如何使用stanford NLP工具包 作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...
随机推荐
- 函数也是对象,本片介绍函数的属性、方法、Function()狗仔函数。
1.arguments.length表示实参的个数. 2.arguments.callee.length表示形参个数. function test(a,b,c,d,e,f){ alert(argume ...
- iOS开发小技巧--即时通讯项目:使用富文本在UILabel中显示图片和文字;使用富文本占位显示图片
Label借助富文本显示图片 1.即时通讯项目中语音消息UI的实现,样式如图: 借助富文本在UILabel中显示图片和文字 // 1.创建一个可变的富文本 NSMutableAttributedStr ...
- C#使用委托进行异步编程。
首先引用MSDN中的一段话来描述一下如何使用异步方式.NET Framework 允许您异步调用任何方法. 为此,应定义与您要调用的方法具有相同签名的委托:公共语言运行时会自动使用适当的签名为该委托定 ...
- 求割点 poj 1523
给你一些双向边 求有多少个割点 并输出去掉点这个点 去掉后有几个联通分量 Tarjan #include<stdio.h> #include<algorithm> #inclu ...
- 如何让div显示在embed,flash元素之上
Z-INDEX属性只对块状元素有效,对于flash是没用的,那么我们怎么处理这个问题呢,问大家介绍两种很简便的方法 方法一 把<embed>标记写在<object>之内 方法二 ...
- js-函数eval
eval函数接收一个参数s,如果s不是字符串,则直接返回s.否则执行s语句.如果s语句执行结果是一个值,则返回此值,否则返回undefined. 需要特别注意的是对象声明语法“{}”并不能返回一个值, ...
- C# 通过后台获取浏览器域名
通过httpContext.获取当前地址当前主机域名 string url = HttpContext.Current.Request.Url.Host.ToString();
- iOS 蓝牙开发(四)BabyBluetooth蓝牙库介绍(转)
转载自:http://www.cocoachina.com/ios/20151106/14072.html 原文作者:刘彦玮 BabyBluetooth 是一个最简单易用的蓝牙库,基于CoreBlue ...
- python 学习笔记11(objgraph)
33. objgraph objgraph是Python的一个第三方包.安装之前需要安装xdot. 用途 安装 例子
- 【caffe】cifar10例子之quick_train.sh在windows下的解决方案
@tags caffe 照例还是转写为python脚本: import os caffe_root=os.environ['caffe_root'] caffe_build=os.environ['c ...
