【389】Implement N-grams using NLTK
Ref: n-grams in python, four, five, six grams?
Ref: "Elegant n-gram generation in Python"
import nltk sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [i for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [i for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [i for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
[('At', 'eight'), ('eight', "o'clock"), ("o'clock", 'on'), ('on', 'Thursday'), ('Thursday', 'morning'), ('morning', 'Arthur'), ('Arthur', 'did'), ('did', "n't"), ("n't", 'feel'), ('feel', 'very'), ('very', 'good'), ('good', '.')] 3 grams:
[('At', 'eight', "o'clock"), ('eight', "o'clock", 'on'), ("o'clock", 'on', 'Thursday'), ('on', 'Thursday', 'morning'), ('Thursday', 'morning', 'Arthur'), ('morning', 'Arthur', 'did'), ('Arthur', 'did', "n't"), ('did', "n't", 'feel'), ("n't", 'feel', 'very'), ('feel', 'very', 'good'), ('very', 'good', '.')] 4 grams:
[('At', 'eight', "o'clock", 'on'), ('eight', "o'clock", 'on', 'Thursday'), ("o'clock", 'on', 'Thursday', 'morning'), ('on', 'Thursday', 'morning', 'Arthur'), ('Thursday', 'morning', 'Arthur', 'did'), ('morning', 'Arthur', 'did', "n't"), ('Arthur', 'did', "n't", 'feel'), ('did', "n't", 'feel', 'very'), ("n't", 'feel', 'very', 'good'), ('feel', 'very', 'good', '.')]
Another method to output:
import nltk sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [' '.join(list(i)) for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [' '.join(list(i)) for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [' '.join(list(i)) for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
['At eight', "eight o'clock", "o'clock on", 'on Thursday', 'Thursday morning', 'morning Arthur', 'Arthur did', "did n't", "n't feel", 'feel very', 'very good', 'good .'] 3 grams:
["At eight o'clock", "eight o'clock on", "o'clock on Thursday", 'on Thursday morning', 'Thursday morning Arthur', 'morning Arthur did', "Arthur did n't", "did n't feel", "n't feel very", 'feel very good', 'very good .'] 4 grams:
["At eight o'clock on", "eight o'clock on Thursday", "o'clock on Thursday morning", 'on Thursday morning Arthur', 'Thursday morning Arthur did', "morning Arthur did n't", "Arthur did n't feel", "did n't feel very", "n't feel very good", 'feel very good .']
获取一段文字中的大写字母开头的词组和单词
import nltk
from nltk.corpus import stopwords
a = "I am Alex Lee. I am from Denman Prospect and I love this place very much. We don't like apple. The big one is good."
tokens = nltk.word_tokenize(a)
caps = []
for i in range(1, 4):
for eles in nltk.ngrams(tokens, i):
length = len(list(eles))
for j in range(length):
if eles[j][0].islower() or not eles[j][0].isalpha():
break
elif j == length - 1:
caps.append(' '.join(list(eles))) caps = list(set(caps))
caps = [c for c in caps if c.lower() not in stopwords.words('english')]
print(caps) outputs:
['Denman', 'Prospect', 'Alex Lee', 'Lee', 'Alex', 'Denman Prospect']
【389】Implement N-grams using NLTK的更多相关文章
- 【leetcode】Implement strStr() (easy)
Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...
- 【leetcode】Implement strStr()
Implement strStr() Implement strStr(). Returns the index of the first occurrence of needle in haysta ...
- 【Leetcode】【Easy】Implement strStr()
Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...
- 【LeetCode225】 Implement Stack using Queues★
1.题目 2.思路 3.java代码 import java.util.LinkedList; import java.util.Queue; public class MyStack { priva ...
- 【LeetCode232】 Implement Queue using Stacks★
1.题目描述 2.思路 思路简单,这里用一个图来举例说明: 3.java代码 public class MyQueue { Stack<Integer> stack1=new Stack& ...
- 【LeetCode】Implement strStr()(实现strStr())
这道题是LeetCode里的第28道题. 题目描述: 实现 strStr() 函数. 给定一个 haystack 字符串和一个 needle 字符串,在 haystack 字符串中找出 needle ...
- 28. Implement strStr()【easy】
28. Implement strStr()[easy] Implement strStr(). Returns the index of the first occurrence of needle ...
- 【LeetCode】哈希表 hash_table(共88题)
[1]Two Sum (2018年11月9日,k-sum专题,算法群衍生题) 给了一个数组 nums, 和一个 target 数字,要求返回一个下标的 pair, 使得这两个元素相加等于 target ...
- 【LeetCode】String to Integer (atoi) 解题报告
这道题在LeetCode OJ上难道属于Easy.可是通过率却比較低,究其原因是须要考虑的情况比較低,非常少有人一遍过吧. [题目] Implement atoi to convert a strin ...
随机推荐
- 关于把Json数据绑定到select2中
最近做的一个项目中用到select2,想把Json的数据绑定到select2中,select2默认的能够接受的json格式的数据是以{id:"",text:''}这样的键值对来保存 ...
- SQL with(unlock)与with(readpast)
所有Select加 With (NoLock)解决阻塞死锁,在查询语句中使用 NOLOCK 和 READPAST 处理一个数据库死锁的异常时候,其中一个建议就是使用 NOLOCK 或者 READPAS ...
- vue项目,npm install后,npm run dev报错问题
报错: ERR! code ELIFECYCLE npm ERR! errno 1 npm ERR! metools@1.0.0 dev: `node build/dev-server.js` npm ...
- Linux free -m 详解命令
如下显示free是显示的当前内存的使用,-m的意思是M字节来显示内容.我们来一起看看. 1 2 3 4 5 6 $ free -m total used ...
- spring揭密学习笔记
spring揭密学习笔记 spring揭密学习笔记(1) --spring的由来 spring揭密学习笔记(2)-spring ioc容器:IOC的基本概念
- tornado-cookies+pycket 验证
1.pip install pycket pip install redis 2.config settings = dict( debug=True, template_path='template ...
- Git工作流基础简介【与产品经理.jpg】
基于可视化界面的操作可使用Sourcetree这个软件进行操作. 下面将描绘的几个命令主要是 git init git add git commit git status git reset HEAD ...
- ROS学习手记 - 2.1: Create and Build ROS Package 生成包(Python)
ROS学习手记 - 2.1: Create and Build ROS Package 生成包(Python) 时隔1年,再回来总结这个问题,因为它是ros+python开发中,太常用的一个操作,需要 ...
- ROS学习手记 - 7 创建ROS msg & srv
至此,我们初步学习了ROS的基本工具,接下来一步步理解ROS的各个工作部件的创建和工作原理. 本文的详细文档:http://wenku.baidu.com/view/623f41b3376baf1ff ...
- 2014最新 iOS App 提交上架store 详细流程
http://blog.csdn.net/tt5267621/article/details/39430659