Ref: Natural Language Toolkit

Ref: n-grams in python, four, five, six grams?

Ref: "Elegant n-gram generation in Python"

import nltk

sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [i for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [i for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [i for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
[('At', 'eight'), ('eight', "o'clock"), ("o'clock", 'on'), ('on', 'Thursday'), ('Thursday', 'morning'), ('morning', 'Arthur'), ('Arthur', 'did'), ('did', "n't"), ("n't", 'feel'), ('feel', 'very'), ('very', 'good'), ('good', '.')] 3 grams:
[('At', 'eight', "o'clock"), ('eight', "o'clock", 'on'), ("o'clock", 'on', 'Thursday'), ('on', 'Thursday', 'morning'), ('Thursday', 'morning', 'Arthur'), ('morning', 'Arthur', 'did'), ('Arthur', 'did', "n't"), ('did', "n't", 'feel'), ("n't", 'feel', 'very'), ('feel', 'very', 'good'), ('very', 'good', '.')] 4 grams:
[('At', 'eight', "o'clock", 'on'), ('eight', "o'clock", 'on', 'Thursday'), ("o'clock", 'on', 'Thursday', 'morning'), ('on', 'Thursday', 'morning', 'Arthur'), ('Thursday', 'morning', 'Arthur', 'did'), ('morning', 'Arthur', 'did', "n't"), ('Arthur', 'did', "n't", 'feel'), ('did', "n't", 'feel', 'very'), ("n't", 'feel', 'very', 'good'), ('feel', 'very', 'good', '.')]

Another method to output:

import nltk

sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [' '.join(list(i)) for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [' '.join(list(i)) for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [' '.join(list(i)) for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
['At eight', "eight o'clock", "o'clock on", 'on Thursday', 'Thursday morning', 'morning Arthur', 'Arthur did', "did n't", "n't feel", 'feel very', 'very good', 'good .'] 3 grams:
["At eight o'clock", "eight o'clock on", "o'clock on Thursday", 'on Thursday morning', 'Thursday morning Arthur', 'morning Arthur did', "Arthur did n't", "did n't feel", "n't feel very", 'feel very good', 'very good .'] 4 grams:
["At eight o'clock on", "eight o'clock on Thursday", "o'clock on Thursday morning", 'on Thursday morning Arthur', 'Thursday morning Arthur did', "morning Arthur did n't", "Arthur did n't feel", "did n't feel very", "n't feel very good", 'feel very good .']

获取一段文字中的大写字母开头的词组和单词

import nltk
from nltk.corpus import stopwords
a = "I am Alex Lee. I am from Denman Prospect and I love this place very much. We don't like apple. The big one is good."
tokens = nltk.word_tokenize(a)
caps = []
for i in range(1, 4):
for eles in nltk.ngrams(tokens, i):
length = len(list(eles))
for j in range(length):
if eles[j][0].islower() or not eles[j][0].isalpha():
break
elif j == length - 1:
caps.append(' '.join(list(eles))) caps = list(set(caps))
caps = [c for c in caps if c.lower() not in stopwords.words('english')]
print(caps) outputs:
['Denman', 'Prospect', 'Alex Lee', 'Lee', 'Alex', 'Denman Prospect']

【389】Implement N-grams using NLTK的更多相关文章

  1. 【leetcode】Implement strStr() (easy)

    Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...

  2. 【leetcode】Implement strStr()

    Implement strStr() Implement strStr(). Returns the index of the first occurrence of needle in haysta ...

  3. 【Leetcode】【Easy】Implement strStr()

    Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...

  4. 【LeetCode225】 Implement Stack using Queues★

    1.题目 2.思路 3.java代码 import java.util.LinkedList; import java.util.Queue; public class MyStack { priva ...

  5. 【LeetCode232】 Implement Queue using Stacks★

    1.题目描述 2.思路 思路简单,这里用一个图来举例说明: 3.java代码 public class MyQueue { Stack<Integer> stack1=new Stack& ...

  6. 【LeetCode】Implement strStr()(实现strStr())

    这道题是LeetCode里的第28道题. 题目描述: 实现 strStr() 函数. 给定一个 haystack 字符串和一个 needle 字符串,在 haystack 字符串中找出 needle ...

  7. 28. Implement strStr()【easy】

    28. Implement strStr()[easy] Implement strStr(). Returns the index of the first occurrence of needle ...

  8. 【LeetCode】哈希表 hash_table(共88题)

    [1]Two Sum (2018年11月9日,k-sum专题,算法群衍生题) 给了一个数组 nums, 和一个 target 数字,要求返回一个下标的 pair, 使得这两个元素相加等于 target ...

  9. 【LeetCode】String to Integer (atoi) 解题报告

    这道题在LeetCode OJ上难道属于Easy.可是通过率却比較低,究其原因是须要考虑的情况比較低,非常少有人一遍过吧. [题目] Implement atoi to convert a strin ...

随机推荐

  1. 2017湖湘杯复赛writeup

    2017湖湘杯复赛writeup 队伍名:China H.L.B 队伍同时在打 X-NUCA  和 湖湘杯的比赛,再加上周末周末周末啊,陪女朋友逛街吃饭看电影啊.所以精力有点分散,做出来部分题目,现在 ...

  2. 使用fakeroot模拟root权限执行程序(转)

    Hack #57: 使用fakeroot模拟root权限执行程序 fakeroot是什么 例如Debian在生成package的时候,编译完之后,不能立刻在当前环境执行make install,需要执 ...

  3. 第7课 列表初始化(2)_分析initializer_list<T>的实现

    1. 初始化列表的实现 (1)当编译器看到{t1,t2…tn}时便会生成一个initializer_list<T>对象(其中的T为元素的类型),它关联到一个array<T,n> ...

  4. 第11章 拾遗1:网络地址转换(NAT)和端口映射

    1. 网络地址转换(NAT) 1.1 NAT的应用场景 (1)应用场景:允许将私有IP地址映射到公网地址,以减缓IP地址空间的消耗 ①需要连接Internet,但主机没有公网IP地址 ②更换了一个新的 ...

  5. SqlServer存储过程输出参数

    if exists(select 1 from sysobjects where name='P_PreOrderInfo') drop Procedure P_PreOrderInfo go Cre ...

  6. kafka的几个简单操作

    怎么安装解压kafka这里就不多说了,从配置文件说起 我这里搭建的是三节点集群 master  slave1 slave2 修改server.properties 文件 把自己本地安装的zookeep ...

  7. .net百度编辑器的使用

    1.前端引用 <%@ Page ValidateRequest="false" Language="C#" AutoEventWireup="t ...

  8. CF603EPastoral Oddities

    /* LCT管子题(说的就是你 水管局长) 首先能得到一个结论, 那就是当且仅当所有联通块都是偶数时存在构造方案 LCT动态加边, 维护最小生成联通块, 用set维护可以删除的边, 假如现在删除后不影 ...

  9. day2作业(基本数据类型,语法)

    #coding:utf-8 '''1.写代码实现用户输入用户名和密码,当用户名为 seven 且 密码为 123 时,显示登陆成功,否则登陆失败!实现用户输入用户名和密码,当用户名为 seven 且 ...

  10. CSS3的过渡和转换

    CSS3的过渡和转换 1.过渡 什么是过渡呢?过渡通俗的来说就是从一个样式到另一个样式的逐渐转换改变的效果. 过渡的属性: 属性 描述 css transition 简写属性,用于在一个属性中设置4个 ...