Python第三方库wordcloud（词云）快速入门与进阶

前言：

笔主开发环境：Python3+Windows

推荐初学者使用Anaconda来搭建Python环境，这样很方便而且能提高学习速度与效率。

简介：

wordcloud是Python中的一个小巧的词云生成器。

github:https://github.com/amueller/word_cloud

官网:https://amueller.github.io/word_cloud/

下载：

1——使用conda下载（前提是安装了Anaconda，推荐这种方法）：

conda install -c conda-forge wordcloud

2——使用pip命令(笔主开发环境为windows，第一次按这种方法安装，会出现错误，按照网上的解决办法一直没解决)：

pip install wordcloud

实例

1–入门案例

#!/usr/bin/env python

"""

Minimal Example

===============

使用默认参数根据美国宪法生成方形的词云

"""

from os import path

from wordcloud import WordCloud

d = path.dirname(__file__)

# 读取整个文本

text = open(path.join(d, 'constitution.txt')).read()

# 生成一个词云图像

wordcloud = WordCloud().generate(text)

# matplotlib的方式展示生成的词云图像

import matplotlib.pyplot as plt

plt.imshow(wordcloud, interpolation='bilinear')

plt.axis("off")

#max_font_size设定生成词云中的文字最大大小

#width,height,margin可以设置图片属性

# generate 可以对全部文本进行自动分词,但是他对中文支持不好

wordcloud = WordCloud(max_font_size=66).generate(text)

plt.figure()

plt.imshow(wordcloud, interpolation="bilinear")

plt.axis("off")

plt.show()

# pil方式展示生成的词云图像（如果你没有matplotlib）

# image = wordcloud.to_image()

# image.show()

2–使用蒙版图像可以生成任意形状的wordcloud。

#!/usr/bin/env python

"""

Masked wordcloud

================

使用蒙版图像可以生成任意形状的wordcloud。

"""

from os import path

from PIL import Image

import numpy as np

import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS

d = path.dirname(__file__)

# 读取整个文本.

text = open(path.join(d, 'alice.txt')).read()

#读取图片（图片来源：http://www.stencilry.org/stencils/movies/alice%20in%20wonderland/255fk.jpg）

alice_mask = np.array(Image.open(path.join(d, "alice_mask.png")))

stopwords = set(STOPWORDS)

stopwords.add("said")

#设置词云的一些属性

wc = WordCloud(background_color="white", max_words=2000, mask=alice_mask,

               stopwords=stopwords)

# 生成词云

wc.generate(text)

#保存到本地

wc.to_file(path.join(d, "alice.png"))

#展示

plt.imshow(wc, interpolation='bilinear')

plt.axis("off")

plt.figure()

plt.imshow(alice_mask, cmap=plt.cm.gray, interpolation='bilinear')

plt.axis("off")

plt.show()

3–着色

#!/usr/bin/env python

"""

使用自定义颜色

===================

使用重新着色方法和自定义着色功能。

"""

import numpy as np

from PIL import Image

from os import path

import matplotlib.pyplot as plt

import random

from wordcloud import WordCloud, STOPWORDS

def grey_color_func(word, font_size, position, orientation, random_state=None,

                    **kwargs):

    return "hsl(0, 0%%, %d%%)" % random.randint(60, 100)

d = path.dirname(__file__)

# 读取图片（图片来源：http://www.stencilry.org/stencils/movies/star%20wars/storm-trooper.gif）

mask = np.array(Image.open(path.join(d, "stormtrooper_mask.png")))

# 文字来源：“新希望”电影剧本（网址：http://www.imsdb.com/scripts/Star-Wars-A-New-Hope.html）

text = open(path.join(d, 'a_new_hope.txt')).read()

# 预处理一点点文本

text = text.replace("HAN", "Han")

text = text.replace("LUKE'S", "Luke")

# 添加电影剧本特定的停用词

stopwords = set(STOPWORDS)

stopwords.add("int")

stopwords.add("ext")

wc = WordCloud(max_words=1000, mask=mask, stopwords=stopwords, margin=10,

               random_state=1).generate(text)

# 存储默认的彩色图像

default_colors = wc.to_array()

plt.title("Custom colors")

plt.imshow(wc.recolor(color_func=grey_color_func, random_state=3),

           interpolation="bilinear")

wc.to_file("a_new_hope.png")

plt.axis("off")

plt.figure()

plt.title("Default colors")

plt.imshow(default_colors, interpolation="bilinear")

plt.axis("off")

plt.show()

#!/usr/bin/env python

"""

Image-colored wordcloud

=======================

您可以在ImageColorGenerator中实现使用基于图像的着色策略对文字云进行着色，它使用由源图像中的单词占用的区域的平均颜色。

"""

from os import path

from PIL import Image

import numpy as np

import matplotlib.pyplot as plt

from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

d = path.dirname(__file__)

# 读取整个文本

text = open(path.join(d, 'alice.txt')).read()

# 读取蒙板/彩色图像（图片是从http://jirkavinse.deviantart.com/art/quot-Real-Life-quot-Alice-282261010下载的）

alice_coloring = np.array(Image.open(path.join(d, "alice_color.png")))

stopwords = set(STOPWORDS)

stopwords.add("said")

wc = WordCloud(background_color="white", max_words=2000, mask=alice_coloring,

               stopwords=stopwords, max_font_size=40, random_state=42)

# 生成词云

wc.generate(text)

# 从图像创建着色

image_colors = ImageColorGenerator(alice_coloring)

# 显示

plt.imshow(wc, interpolation="bilinear")

plt.axis("off") #不显示坐标尺寸

plt.figure()

# 重新着色词云并显示

# 我们也可以直接在构造函数中给使用：color_func=image_colors

plt.imshow(wc.recolor(color_func=image_colors), interpolation="bilinear")

plt.axis("off") #不显示坐标尺寸

plt.figure()

plt.imshow(alice_coloring, cmap=plt.cm.gray, interpolation="bilinear")

plt.axis("off") #不显示坐标尺寸

plt.show()#一次绘制三张图

关于着色的另一个例子：

colored_by_group.py

4–表情

 #!/usr/bin/env python

"""

表情实例

===============

一个简单的例子，显示如何包含表情符号。 请注意，这个例子似乎不适用于OS X（苹果系统），但是确实如此

在Ubuntu中正常工作

包含表情符号有3个重要步骤：

1) 使用io.open而不是内置的open来读取文本输入。 这确保它被加载为UTF-8

2) 重写词云使用的正则表达式以将文本解析为单词。 默认表达式只会匹配ascii的单词

3) 将默认字体覆盖为支持表情符号的东西。 包含的Symbola字体包括黑色和白色大多数表情符号的白色轮廓。 目前PIL / Pillow库存在的问题似乎可以预防

它在OS X上运行正常（https://github.com/python-pillow/Pillow/issues/1774）。

如果你有问题，试试在Ubuntu上运行

"""

import io

import string

from os import path

from wordcloud import WordCloud

d = path.dirname(__file__)

#使用io.open将文件正确加载为UTF-8非常重要

text = io.open(path.join(d, 'happy-emoji.txt')).read()

# the regex used to detect words is a combination of normal words, ascii art, and emojis

# 2+ consecutive letters (also include apostrophes), e.x It's

normal_word = r"(?:\w[\w']+)"

# 2+ consecutive punctuations, e.x. :)

ascii_art = r"(?:[{punctuation}][{punctuation}]+)".format(punctuation=string.punctuation)

# a single character that is not alpha_numeric or other ascii printable

emoji = r"(?:[^\s])(?<![\w{ascii_printable}])".format(ascii_printable=string.printable)

regexp = r"{normal_word}|{ascii_art}|{emoji}".format(normal_word=normal_word, ascii_art=ascii_art,

                                                     emoji=emoji)

# 生成一个词云图片

# Symbola字体包含大多数表情符号

font_path = path.join(d, 'fonts', 'Symbola', 'Symbola.ttf')

wordcloud = WordCloud(font_path=font_path, regexp=regexp).generate(text)

# 采用matplotlib方式：展示生成的图片

import matplotlib.pyplot as plt

plt.imshow(wordcloud)

plt.axis("off")

plt.show()

以上所有例子可以到我的github上下载（持续更新Python第三方库使用demo以及常见python爬虫，python玩微信等内容）：

https://github.com/Snailclimb/Python/tree/master/PythonDemo/wordcloud

Python第三方库wordcloud（词云）快速入门与进阶的更多相关文章

python爬虫——京东评论、jieba分词、wordcloud词云统计
接上一章,动态页面抓取——抓取京东评论区内容. url=‘https://club.jd.com/comment/productPageComments.action?callback=fetchJS ...
Python第三方库matplotlib（2D绘图库）入门与进阶
Matplotlib 一简介: 二相关文档: 三入门与进阶案例 1- 简单图形绘制 2- figure的简单使用 3- 设置坐标轴 4- 设置legend图例 5- 添加注解和绘制点以及在图形上 ...
数字、字符串、列表、字典，jieba库，wordcloud词云
一.基本数据类型什么是数据类型变量:描述世间万物的事物的属性状态为了描述世间万物的状态,所以有了数据类型,对数据分类为什么要对数据分类针对不同的状态需要不同的数据类型标识数据类型的分类二 ...
【Python基础】安装python第三方库
pip命令行安装(推荐) 打开cmd命令行安装需要的第三方库如:pip install numpy 在安装python的相关模块和库时,我们一般使用“pip install 模块名”或者“pyth ...
jieba分词wordcloud词云
1.jieba库的基本介绍 (1).jieba是优秀的中文分词第三方库中文文本需要通过分词获得单个的词语 jieba是优秀的中文分词第三方库,需要额外安装 jieba库提供三种分词模式,最简单只需掌 ...
python第三方库requests简单介绍
一.发送请求与传递参数简单demo: import requests r = requests.get(url='http://www.itwhy.org') # 最基本的GET请求 print(r ...
Python第三方库资源
[转载]Python第三方库资源转自:https://weibo.com/ttarticle/p/show?id=2309404129469920071093 参考:https://github ...
常用Python第三方库简介
如果说强大的标准库奠定了python发展的基石,丰富的第三方库则是python不断发展的保证,随着python的发展一些稳定的第三库被加入到了标准库里面,这里有6000多个第三方库的介绍:点这里或者访 ...
使用Python第三方库生成二维码
本文主要介绍两个可用于生成二维码的Python第三方库:MyQR和qrcode. MyQR的使用: 安装: pip install MyQR 导入: from MyQR import myqr imp ...

随机推荐

redis——持久化方式RDB与AOF分析
https://blog.csdn.net/u014229282/article/details/81121214 redis两种持久化的方式 RDB持久化可以在指定的时间间隔内生成数据集的时间点快照 ...
C# 知识回顾 - 表达式树 Expression Trees
C# 知识回顾 - 表达式树 Expression Trees 目录简介 Lambda 表达式创建表达式树 API 创建表达式树解析表达式树表达式树的永久性编译表达式树执行表达式树修改表达 ...
Pandas速查手册中文版（转）
关键缩写和包导入在这个速查手册中,我们使用如下缩写: df:任意的Pandas DataFrame对象同时我们需要做如下的引入: import pandas as pd 导入数据 pd.read_ ...
thinkPHP框架单一入口文件解析
一.index.php (可参考ThinkPHP学习手册http://document.thinkphp.cn/manual_3_2.html#entrance_file) index.php单入口 ...
jzoj3865[JSOI2014]士兵部署
‘ 数据范围:n,m<=10^5,传送门:https://jzoj.net/senior/#main/show/3865 感觉jzoj好高明啊,就是访问不太稳定. 首先题意中被n个点控制的区域相 ...
bzoj 2424: [HAOI2010]订货（费用流）
直接费用流,天数就是点数 type arr=record toward,next,cap,cost:longint; end; const maxm=; maxn=; mm=<<; var ...
【CodeChef】Palindromeness（回文树）
[CodeChef]Palindromeness(回文树) 题面 Vjudge CodeChef 中文版题面题解构建回文树,现在的问题就是要求出当前回文串节点的长度的一半的那个回文串所代表的节点 ...
javascript push 和 concat 的区别
array.push(item1,item2,item3...) array.concat(item1,item2,item3...) 1. push和concat的元素都既可以是普通元素(任意类型) ...
Linux环境下用Weblogic发布项目【一】 -- 安装Weblogic
一.Weblogic安装系统环境: 1.前提条件: a.在笔记本[Windows7]上安装远程连接Linux软件:F-Secure SSH File Transfer Trial[简写为:FSSH] ...
JavaScript如何获得input元素value的值
在JavaScript中获取input元素value的值: 方法一: <!DOCTYPE html> <html> <head> <meta charset= ...