sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

QQ:231469242

欢迎nltk爱好者交流

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

命名实体(Named Entity)类别识别

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer sentence="Bush is a pig in WhiteHouse in America."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False) nameEnt.draw()

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text=state_union.raw("2005-GWBush.txt")
sample_text=state_union.raw("2006-GWBush.txt") custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#分句
tokenized=custom_sent_tokenizer.tokenize(sample_text) for i in tokenized[0:5]:
words=nltk.word_tokenize(i)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False)
#print(nameEnt)
nameEnt.draw()
nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.

This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.

Here's an example:

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt") custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e)) process_content()

Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.

Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:

NE Type and Examples
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian

Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.

In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.

自然语言18.1_Named Entity Recognition with NLTK的更多相关文章

  1. 自然语言18.2_NLTK命名实体识别

    QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...

  2. 自然语言12_Tokenizing Words and Sentences with NLTK

    https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...

  3. 自然语言处理NLP程序包(NLTK/spaCy)使用总结

    NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...

  4. 自然语言27_Converting words to Features with NLTK

    https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...

  5. 自然语言15_Part of Speech Tagging with NLTK

    https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...

  6. 【448】NLP, NER, PoS

    目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...

  7. 自然语言18_Named-entity recognition

    https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...

  8. 自然语言17_Chinking with NLTK

    https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...

  9. 【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理

    干货!详述Python NLTK下如何使用stanford NLP工具包 作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...

随机推荐

  1. 十天冲刺---Day3

    站立式会议 站立式会议内容总结: git上Issues新增内容: 燃尽图 照片 组长情绪爆炸是很可怕的事情.这里自责一下. 进度缓慢是一件非常头疼的事情.还有每个人的时间都很紧张,除了学习,还有各种工 ...

  2. 提高你的数据库编程效率:Microsoft CLR Via Sql Server

    你还在为数据库编程而抓狂吗?那些恶心的脚本拼接,低效的脚本调试的日子将会与我们越来越远啦.现在我们能用支持.NET的语言来开发数据库中的对象,如:存储过程,函数,触发器,集合函数已及复杂的类型.看到这 ...

  3. iPad开发--iPad中modal的更多用法

    可以设置modal的呈现样式,常见的有以下四种                                   设置modal的过度样式,也就是展现时候的动画效果 代码示例

  4. Jenkins_获取源码编译并启动服务(二)

    一.创建Maven项目   二.设置SVN信息   三.设置构建触发器   四.设置Maven命令   五.设置构建后发邮件信息(参考文章一) 六.设置构建后拷贝文件到远程机器并执行命令   来自为知 ...

  5. 【POJ 1556】The Doors 判断线段相交+SPFA

    黑书上的一道例题:如果走最短路则会碰到点,除非中间没有障碍. 这样把能一步走到的点两两连边,然后跑SPFA即可. #include<cmath> #include<cstdio> ...

  6. java-读取类中的属性名称和值

    方法 /** * 获取类中的所有属性明名称和值(因涉及到可能会是继承关系的父类,所以从f中去属性名称,从f2中取值,两个可以一样,也可以使父类) * @param f:读取属性类(如果取父类的,则这里 ...

  7. mysql解决无法远程客户端连接

    1. 改表法.可能是你的帐号不允许从远程登陆,只能在localhost.这个时候只要在localhost的那台电脑,登入mysql后,更改 "mysql" 数据库里的 " ...

  8. Java 代码编译和执行的整个过程

    Java 代码编译是由 Java 源码编译器来完成,流程图如下所示: Java 字节码的执行是由 JVM 执行引擎来完成,流程图如下所示: Java 代码编译和执行的整个过程包含了以下三个重要的机制: ...

  9. js 中 == 和=== 有什么区别?

    第一个是相等符:第二个全等符: 其中第一个在比较的时候,会进行类型转换,而第二个则不会, alert('55' == 55);//truealert('55' === 55);//false

  10. break 的一个“高级用法”(转)

    转载:http://blog.csdn.net/lovelan1748/article/details/5321558 本小节不是很适于没有多少实际编程经历的初学者,所以初学者可以跳过,以后再回头阅读 ...