Ref: Natural Language Toolkit

Ref: n-grams in python, four, five, six grams?

Ref: "Elegant n-gram generation in Python"

import nltk

sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [i for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [i for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [i for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
[('At', 'eight'), ('eight', "o'clock"), ("o'clock", 'on'), ('on', 'Thursday'), ('Thursday', 'morning'), ('morning', 'Arthur'), ('Arthur', 'did'), ('did', "n't"), ("n't", 'feel'), ('feel', 'very'), ('very', 'good'), ('good', '.')] 3 grams:
[('At', 'eight', "o'clock"), ('eight', "o'clock", 'on'), ("o'clock", 'on', 'Thursday'), ('on', 'Thursday', 'morning'), ('Thursday', 'morning', 'Arthur'), ('morning', 'Arthur', 'did'), ('Arthur', 'did', "n't"), ('did', "n't", 'feel'), ("n't", 'feel', 'very'), ('feel', 'very', 'good'), ('very', 'good', '.')] 4 grams:
[('At', 'eight', "o'clock", 'on'), ('eight', "o'clock", 'on', 'Thursday'), ("o'clock", 'on', 'Thursday', 'morning'), ('on', 'Thursday', 'morning', 'Arthur'), ('Thursday', 'morning', 'Arthur', 'did'), ('morning', 'Arthur', 'did', "n't"), ('Arthur', 'did', "n't", 'feel'), ('did', "n't", 'feel', 'very'), ("n't", 'feel', 'very', 'good'), ('feel', 'very', 'good', '.')]

Another method to output:

import nltk

sentence = """At eight o'clock on Thursday morning
Arthur didn't feel very good.""" # 1 gram tokens = nltk.word_tokenize(sentence) print("1 gram:\n", tokens, "\n") # 2 grams n = 2 tokens_2 = nltk.ngrams(tokens, n) print("2 grams:\n", [' '.join(list(i)) for i in tokens_2], "\n") # 3 grams n = 3 tokens_3 = nltk.ngrams(tokens, n) print("3 grams:\n", [' '.join(list(i)) for i in tokens_3], "\n") # 4 grams n = 4 tokens_4 = nltk.ngrams(tokens, n) print("4 grams:\n", [' '.join(list(i)) for i in tokens_4], "\n") outputs:
1 gram:
['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] 2 grams:
['At eight', "eight o'clock", "o'clock on", 'on Thursday', 'Thursday morning', 'morning Arthur', 'Arthur did', "did n't", "n't feel", 'feel very', 'very good', 'good .'] 3 grams:
["At eight o'clock", "eight o'clock on", "o'clock on Thursday", 'on Thursday morning', 'Thursday morning Arthur', 'morning Arthur did', "Arthur did n't", "did n't feel", "n't feel very", 'feel very good', 'very good .'] 4 grams:
["At eight o'clock on", "eight o'clock on Thursday", "o'clock on Thursday morning", 'on Thursday morning Arthur', 'Thursday morning Arthur did', "morning Arthur did n't", "Arthur did n't feel", "did n't feel very", "n't feel very good", 'feel very good .']

获取一段文字中的大写字母开头的词组和单词

import nltk
from nltk.corpus import stopwords
a = "I am Alex Lee. I am from Denman Prospect and I love this place very much. We don't like apple. The big one is good."
tokens = nltk.word_tokenize(a)
caps = []
for i in range(1, 4):
for eles in nltk.ngrams(tokens, i):
length = len(list(eles))
for j in range(length):
if eles[j][0].islower() or not eles[j][0].isalpha():
break
elif j == length - 1:
caps.append(' '.join(list(eles))) caps = list(set(caps))
caps = [c for c in caps if c.lower() not in stopwords.words('english')]
print(caps) outputs:
['Denman', 'Prospect', 'Alex Lee', 'Lee', 'Alex', 'Denman Prospect']

【389】Implement N-grams using NLTK的更多相关文章

  1. 【leetcode】Implement strStr() (easy)

    Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...

  2. 【leetcode】Implement strStr()

    Implement strStr() Implement strStr(). Returns the index of the first occurrence of needle in haysta ...

  3. 【Leetcode】【Easy】Implement strStr()

    Implement strStr(). Returns the index of the first occurrence of needle in haystack, or -1 if needle ...

  4. 【LeetCode225】 Implement Stack using Queues★

    1.题目 2.思路 3.java代码 import java.util.LinkedList; import java.util.Queue; public class MyStack { priva ...

  5. 【LeetCode232】 Implement Queue using Stacks★

    1.题目描述 2.思路 思路简单,这里用一个图来举例说明: 3.java代码 public class MyQueue { Stack<Integer> stack1=new Stack& ...

  6. 【LeetCode】Implement strStr()(实现strStr())

    这道题是LeetCode里的第28道题. 题目描述: 实现 strStr() 函数. 给定一个 haystack 字符串和一个 needle 字符串,在 haystack 字符串中找出 needle ...

  7. 28. Implement strStr()【easy】

    28. Implement strStr()[easy] Implement strStr(). Returns the index of the first occurrence of needle ...

  8. 【LeetCode】哈希表 hash_table(共88题)

    [1]Two Sum (2018年11月9日,k-sum专题,算法群衍生题) 给了一个数组 nums, 和一个 target 数字,要求返回一个下标的 pair, 使得这两个元素相加等于 target ...

  9. 【LeetCode】String to Integer (atoi) 解题报告

    这道题在LeetCode OJ上难道属于Easy.可是通过率却比較低,究其原因是须要考虑的情况比較低,非常少有人一遍过吧. [题目] Implement atoi to convert a strin ...

随机推荐

  1. windows server 2008通过任务计划程序定时访问网站

    1.新一个.bat文件,如: @echo offstart 网址exit 2.打windows server 2008,新建任务计划程序定时访问任务

  2. linux xml

    1:xml的基础语法,识别,创建xml文件 xml文件头:<?xml version="1.0" encoding="utf-8"?> 必须要有且 ...

  3. webstorm命令行无法使用node-gyp进行编译

    换成cmd命令即可:

  4. day6--面向对象初识

    一面向过程与面向对象 面向过程: 流水线式的思维,顺着流程进行下去,类似于代码的堆叠,重视步骤 优点:不需要考虑太多东西,想到一个功能就写一个功能,堆叠代码 缺点:过程往往是先后执行的,要想变换功能或 ...

  5. [Unity插件]Lua行为树(九):条件节点调整

    先看一下之前的条件节点是怎么设计的: BTConditional.lua BTConditional = BTTask:New(); local this = BTConditional; this. ...

  6. 安装配置fastDFS文件服务器 - Linux

    一.配置linux环境 1.新建虚拟机 把上次安装的CentOS7的文件复制一份,并改名 打开VM>打开虚拟机,选择刚才复制好的虚拟机,并启动.这样做的目的主要是为了保留一份最基础的母本,为了将 ...

  7. iOS保存gif动态图

    - (void)saveImageToPhotos:(NSData*)gifData { /***注意先倒入库 #import <AssetsLibrary/AssetsLibrary.h> ...

  8. ftp的安全问题

    ftp漏洞http://www.4hou.com/technology/3507.html

  9. hadoop distcp 命令& 不同hadoop 版本cp

    # 1 版本相同 hadoop distcp -m 10 -bandwidth 150 hdfs://ns1/user/hive/warehouse/public.db/public_oi_fact ...

  10. Android dialog使用

    翻译自:开发->API 指南->User Interface & Navigation->Dialogs 注意: dialog是一个基类,但是我们应该尽可能避免直接使用dia ...