sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

QQ:231469242

欢迎nltk爱好者交流

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

命名实体(Named Entity)类别识别

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer sentence="Bush is a pig in WhiteHouse in America."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False) nameEnt.draw()

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text=state_union.raw("2005-GWBush.txt")
sample_text=state_union.raw("2006-GWBush.txt") custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#分句
tokenized=custom_sent_tokenizer.tokenize(sample_text) for i in tokenized[0:5]:
words=nltk.word_tokenize(i)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False)
#print(nameEnt)
nameEnt.draw()
nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.

This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.

Here's an example:

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt") custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e)) process_content()

Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.

Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:

NE Type and Examples
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian

Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.

In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.

自然语言18.1_Named Entity Recognition with NLTK的更多相关文章

  1. 自然语言18.2_NLTK命名实体识别

    QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...

  2. 自然语言12_Tokenizing Words and Sentences with NLTK

    https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...

  3. 自然语言处理NLP程序包(NLTK/spaCy)使用总结

    NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...

  4. 自然语言27_Converting words to Features with NLTK

    https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...

  5. 自然语言15_Part of Speech Tagging with NLTK

    https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...

  6. 【448】NLP, NER, PoS

    目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...

  7. 自然语言18_Named-entity recognition

    https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...

  8. 自然语言17_Chinking with NLTK

    https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...

  9. 【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理

    干货!详述Python NLTK下如何使用stanford NLP工具包 作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...

随机推荐

  1. 使用jsp/servlet简单实现文件上传与下载

    使用JSP/Servlet简单实现文件上传与下载    通过学习黑马jsp教学视频,我学会了使用jsp与servlet简单地实现web的文件的上传与下载,首先感谢黑马.好了,下面来简单了解如何通过使用 ...

  2. mysql常用方法学习

    环境 create table phople ( id int(11) not null primary key auto_increment, name char(20) not null, sex ...

  3. Linux内核参数配置

    Linux在系统运行时修改内核参数(/proc/sys与/etc/sysctl.conf),而不需要重新引导系统,这个功能是通过/proc虚拟文件系统实现的. 在/proc/sys目录下存放着大多数的 ...

  4. Android Studio 优秀插件汇总

    第一部分 插件的介绍 Google 在2013年5月的I/O开发者大会推出了基于IntelliJ IDEA java ide上的Android Studio.AndroidStudio是一个功能齐全的 ...

  5. Android 自定义Popupwindow 注意事项,手机和平板的区别

    首先自定义ppw是要继承Popupwindow 的 而要成功的显示出自定义的ppw就必须实现下面的三句代码 // 必要的三要素下面,不然popWind显示不出来 this.setContentView ...

  6. javascript 对象实例

    创建对象: var o = new Objct(); //创建一个空对象 var o = {}; var a = new Array(); //创建一个空数组 var a = []; var d = ...

  7. Windows命令 dos

    1.dos下运行netstat -na 查看本机开启的端口

  8. js 删除

    /* *  方法:Array.remove(dx) *  功能:根据元素值删除数组元素. *  参数:元素值 *  返回:在原数组上修改数组 *  作者:pxp */ Array.prototype. ...

  9. The user operation is waiting for "Building workspace" to complete

    1.选择菜单栏的“Project”,然后把菜单栏中“Build Automatically”前面的对钩去掉. 2.当你修改或添加代码后,选择菜单栏的“Project”,然后选择菜单栏中“Build A ...

  10. Spring中的Autowired注解和Resource注解的区别

    1.所属jar包不同,Autowired是Spring中的Resource是JSR-250规范定义的注解