sklearn实战-乳腺癌细胞数据挖掘(博主亲自录制视频教程)

https://study.163.com/course/introduction.htm?courseId=1005269003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=share

QQ:231469242

欢迎nltk爱好者交流

https://www.pythonprogramming.net/named-entity-recognition-nltk-tutorial/?completed=/chinking-nltk-tutorial/

Named Entity Recognition with NLTK

命名实体(Named Entity)类别识别

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer sentence="Bush is a pig in WhiteHouse in America."
words=nltk.word_tokenize(sentence)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False) nameEnt.draw()

This is a temporary script file.
""" import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text=state_union.raw("2005-GWBush.txt")
sample_text=state_union.raw("2006-GWBush.txt") custom_sent_tokenizer=PunktSentenceTokenizer(train_text)
#分句
tokenized=custom_sent_tokenizer.tokenize(sample_text) for i in tokenized[0:5]:
words=nltk.word_tokenize(i)
tagged=nltk.pos_tag(words)
nameEnt=nltk.ne_chunk(tagged,binary=False)
#print(nameEnt)
nameEnt.draw()
nameEnt=nltk.ne_chunk(tagged,binary=True)

nameEnt=nltk.ne_chunk(tagged,binary=False)

One of the most major forms of chunking in natural language processing is called "Named Entity Recognition." The idea is to have the machine immediately be able to pull out "entities" like people, places, things, locations, monetary figures, and more.

This can be a bit of a challenge, but NLTK is this built in for us. There are two major options with NLTK's named entity recognition: either recognize all named entities, or recognize named entities as their respective type, like people, places, locations, etc.

Here's an example:

import nltk
from nltk.corpus import state_union
from nltk.tokenize import PunktSentenceTokenizer train_text = state_union.raw("2005-GWBush.txt")
sample_text = state_union.raw("2006-GWBush.txt") custom_sent_tokenizer = PunktSentenceTokenizer(train_text) tokenized = custom_sent_tokenizer.tokenize(sample_text) def process_content():
try:
for i in tokenized[5:]:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
namedEnt = nltk.ne_chunk(tagged, binary=True)
namedEnt.draw()
except Exception as e:
print(str(e)) process_content()

Here, with the option of binary = True, this means either something is a named entity, or not. There will be no further detail. The result is:

If you set binary = False, then the result is:

Immediately, you can see a few things. When Binary is False, it picked up the same things, but wound up splitting up terms like White House into "White" and "House" as if they were different, whereas we could see in the binary = True option, the named entity recognition was correct to say White House was part of the same named entity.

Depending on your goals, you may use the binary option how you see fit. Here are the types of Named Entities that you can get if you have binary as false:

NE Type and Examples
ORGANIZATION - Georgia-Pacific Corp., WHO
PERSON - Eddy Bonte, President Obama
LOCATION - Murray River, Mount Everest
DATE - June, 2008-06-29
TIME - two fifty a m, 1:30 p.m.
MONEY - 175 million Canadian Dollars, GBP 10.40
PERCENT - twenty pct, 18.75 %
FACILITY - Washington Monument, Stonehenge
GPE - South East Asia, Midlothian

Either way, you will probably find that you need to do a bit more
work to get it just right, but this is pretty powerful right out of the
box.

In the next tutorial, we're going to talk about something similar to stemming, called lemmatizing.

自然语言18.1_Named Entity Recognition with NLTK的更多相关文章

  1. 自然语言18.2_NLTK命名实体识别

    QQ:231469242 欢迎nltk爱好者交流 http://blog.csdn.net/u010718606/article/details/50148261 NLTK中对于很多自然语言处理应用有 ...

  2. 自然语言12_Tokenizing Words and Sentences with NLTK

    https://www.pythonprogramming.net/tokenizing-words-sentences-nltk-tutorial/ # -*- coding: utf-8 -*- ...

  3. 自然语言处理NLP程序包(NLTK/spaCy)使用总结

    NLTK和SpaCy是NLP的Python应用,提供了一些现成的处理工具和数据接口.下面介绍它们的一些常用功能和特性,便于对NLP研究的组成形式有一个基本的了解. NLTK Natural Langu ...

  4. 自然语言27_Converting words to Features with NLTK

    https://www.pythonprogramming.net/words-as-features-nltk-tutorial/ Converting words to Features with ...

  5. 自然语言15_Part of Speech Tagging with NLTK

    https://www.pythonprogramming.net/part-of-speech-tagging-nltk-tutorial/?completed=/stemming-nltk-tut ...

  6. 【448】NLP, NER, PoS

    目录: 停用词 —— stopwords 介词 —— prepositions —— part of speech Named Entity Recognition (NER) 3.1 Stanfor ...

  7. 自然语言18_Named-entity recognition

    https://en.wikipedia.org/wiki/Named-entity_recognition http://book.51cto.com/art/201107/276852.htm 命 ...

  8. 自然语言17_Chinking with NLTK

    https://www.pythonprogramming.net/chinking-nltk-tutorial/?completed=/chunking-nltk-tutorial/ 代码 # -* ...

  9. 【NLP】干货!Python NLTK结合stanford NLP工具包进行文本处理

    干货!详述Python NLTK下如何使用stanford NLP工具包 作者:白宁超 2016年11月6日19:28:43 摘要:NLTK是由宾夕法尼亚大学计算机和信息科学使用python语言实现的 ...

随机推荐

  1. Android ListView 详解

    我做Android已经有一段时间了,想想之前在学习Android基础知识的时候看到了许许多多博主的博文 和许多的论坛.网站.那时候就非常感谢那些博主们能吧自己的知识分享在互联网上,那时候我就想 如果我 ...

  2. 1121高性能MySQL之运行机制

    本文来自于拜读<高性能MySQL(第三版)>时的读书笔记作者:安明哲转载时请注明部分内容来自<高性能MySQL(第三版)> MySQL的逻辑构架 MySQL服务器逻辑架构 最上 ...

  3. Python:no encoding declared 错误

    使用Python编译的时候出现如下错误: SyntaxError: Non-ASCII character ‘\xe5’ in file magentonotes.com.py on line 2, ...

  4. iOS开发--录音简单实现

  5. 区间DP HDU 4283

    t个数据 n个权值 1->n 可以入栈调整顺序 花费 第k个出来 w[i]*(k-1); 求花费最少 #include<stdio.h> #include<string.h&g ...

  6. linux 命令行下更换软件源

    首先备份默认源: sudo cp /etc/apt/sources.list /etc/apt/sources.list.old 清空默认源: sudo cat /dev/null > /etc ...

  7. lucene-查询query->PhraseQuery多关键字的搜索

    用户在搜索引擎中进行搜索时,常常查找的并非是一个简单的单词,很有可能是几个不同的关键字.这些关键字之间要么是紧密相联,成为一个精确的短 语,要么是可能在这几个关键字之间还插有其他无关的关键字.此时,用 ...

  8. java判断request请求是手机还是pc终端

    /** * 判断请求是否手机端 * @param req * @return */ public static boolean isMobile(HttpServletRequest req) { U ...

  9. jsoup-提示java.net.SocketTimeoutException:Read timed out

    使用Jsoup.connect(url).get()连接某网站时偶尔会出现 java.net.SocketTimeoutException:Read timed out异常. 原因是默认的Socket ...

  10. 【收藏】Android更新UI的几种常见方法

    ----------------将会调用onDraw()重绘控件---------------- 1.view.invalidate刷新UI(主线程)   2.view.postInvalidate刷 ...