自然语言18.1_Named Entity Recognition with NLTK
sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

QQ:231469242
欢迎nltk爱好者交流
https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/
Named Entity Recognition with NLTK
命名实体(Named Entity)类别识别
This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer sentence="Bush is a pig in WhiteHouse in America."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False) nameEnt.draw()

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text=state_union.raw("2005-GWBush.txt")
sample_text=state_union.raw("2006-GWBush.txt") custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#分句
tokenized=custom_sent_tokenizer.tokenize(sample_text) for i in tokenized[0:5]:
words=nltk.word_tokenize(i)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False)
#print(nameEnt)
nameEnt.draw()
nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.
This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.
Here's an example:
import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt") custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e)) process_content()
Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.
Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian
Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.
In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.
自然语言18.1_Named Entity Recognition with NLTK的更多相关文章
- 自然语言18.2_NLTK命名实体识别
QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...
- 自然语言12_Tokenizing Words and Sentences with NLTK
https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...
- 自然语言处理NLP程序包(NLTK/spaCy)使用总结
NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...
- 自然语言27_Converting words to Features with NLTK
https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...
- 自然语言15_Part of Speech Tagging with NLTK
https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...
- 【448】NLP, NER, PoS
目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...
- 自然语言18_Named-entity recognition
https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...
- 自然语言17_Chinking with NLTK
https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...
- 【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理
干货!详述Python NLTK下如何使用stanford NLP工具包 作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...
随机推荐
- if..elif语句
根据用户输入内容打印其权限 # alex --> 超级管理员 # eric --> 普通管理员 # tony,rain --> 业务主管 # 其他 --> 普通用户 name ...
- Java--剑指offer(4)
16.输入两个单调递增的链表,输出两个链表合成后的链表,当然我们需要合成后的链表满足单调不减规则. a)这里首先判断两个链表中有没有空表,这个就是依据表头是否为空.然后就是比较节点值的大小,然后就是使 ...
- linux 远程桌面连接
我们知道在windows下面我们可以用远程桌面连接来控制其它电脑, 但linux 远程桌面连接?不过在说怎样连接之前还是要先明确一个概念,为什么我标题没有用linux中的远程桌面连接呢, 这是因为Li ...
- LVS + Keepalived + Nginx安装及配置
1.概述 上篇文章<架构设计:负载均衡层设计方案(6)——Nginx + Keepalived构建高可用的负载层>(http://blog.csdn.net/yinwenjie/artic ...
- IOS APP开发中View的几种实现方式
xib文件有以下几个重要的属性: xib文件名 File’s Owner xib文件中的视图的Class xib文件中的视图的Outlet指向 File’s Owner 可以关联到某类,然后通过IBO ...
- Entity Framework Code First (三)Data Annotations
Entity Framework Code First 利用一种被称为约定(Conventions)优于配置(Configuration)的编程模式允许你使用自己的 domain classes 来表 ...
- 02python算法-递推
递推 1什么是递推?:根据已有节点的值,以及规律推出之后节点的值 2为什么要用递推:简单的解决有规矩事件 3怎么用?: 我们举个经典的例子: 如果1对兔子每月能生1对小兔子,而每对小兔在它出生后的第3 ...
- bzoj1051
就是一个tarjan #include<iostream> #include<stack> #include<cstdio> using namespace std ...
- 如何破解Excel文档的编辑密码
对于Excel文档我们不仅可以设置打开密码,还可以设置几天几种密码,比如编辑密码.编辑密码又称写保护密码,是一种可以限制编辑权限的密码.如果我们在日常工作中发现自己忘记了excel编辑密码的话,那就需 ...
- 美化select下拉框
直接上干货: 需要的材料: change_select.js (文末会给出下载地址) 使用方法: 1.调用方法:<script type="text/javascript" ...
