# 使用Python进行词频统计

mytext = """Background
Industrial Light & Magic (ILM) was started in by filmmaker George Lucas, in order to create the special effects for the original Star Wars film. Since then, ILM has grown into a visual effects powerhouse that has contributed not just to the entire Star Wars series, but also to films as diverse as Forrest Gump, Jurassic Park, Who Framed Roger Rabbit, Raiders of the Lost Ark, and Terminator . ILM has won numerous Academy Awards for Best Visual Effects, not to mention a string of Clio awards for its work on television advertisements. While much of ILM's early work was done with miniature models and motion controlled cameras, ILM has long been on the bleeding edge of computer generated visual effects. Its computer graphics division dates back to 1979, and its first CG production was the 1982 Genesis sequence from Star Trek II: The Wrath of Khan. In the early days, ILM was involved with the creation of custom computer graphics hardware and software for scanning, modeling, rendering, and compositing (the process of joining rendered and scanned images together). Some of these systems made significant advances in areas such as morphing and simulating muscles and hair. Naturally, as time went by many of the early innovations at ILM made it into the commercial realm, but the company's position on the cutting edge of visual effects technology continues to rely on an ever-changing combination of custom in-house technologies and commercial products. Today, ILM runs a batch processing environment capable of modeling, rendering and compositing tens of thousands of motion picture frames per day. Thousands of machines running Linux, IRIX, Compaq Tru64, OS X, Solaris, and Windows join together to provide a production pipeline that is used by approximately eight hundred users daily, many of whom write or modify code that controls every step of the production process. In this context, hundreds of commercial and in-house software components are combined to create and process each frame of computer-generated or enhanced film. Making all this work, and keeping it working, requires a certain degree of technical wizardry, as well as a tool set that is up to the task of integrating diverse and frequently changing systems. Enter Python
Back in , in the Dalmation days, ILM was exclusively an SGI IRIX shop, and the production pipeline was controlled by Unix shell scripting.
At that time, ILM was producing - shots per show, typically only a small part of each feature length film to which they were contributing.""" def wordcount(str):
strl_list = str.replace('\n', '').lower().split(" ") count_dict = {}
for str in strl_list:
if str in count_dict.keys():
count_dict[str] = count_dict[str] +
else:
count_dict[str] =
count_list = sorted(count_dict.items(), key=lambda x: x[], reverse=True)
return count_list print(wordcount(mytext))

示例:

E:\Python37\python.exe E:/PythonTest/Test/Test002.py
[('of', ), ('the', ), ('and', ), ('to', ), ('ilm', ), ('was', ), ('a', ), ('in', ), ('as', ), ('that', ), ('by', ), ('for', ), ('has', ), ('visual', ), ('on', ), ('production', ), ('effects', ), ('star', ), ('its', ), ('early', ), ('computer', ), ('commercial', ), ('create', ), ('wars', ), ('into', ), ('not', ), ('but', ), ('diverse', ), ('awards', ), ('work', ), ('with', ), ('motion', ), ('controlled', ), ('edge', ), ('graphics', ), ('days,', ), ('custom', ), ('software', ), ('modeling,', ), ('compositing', ), ('process', ), ('made', ), ('many', ), ('at', ), ('it', ), ('an', ), ('in-house', ), ('thousands', ), ('per', ), ('pipeline', ), ('is', ), ('or', ), ('this', ), ('each', ), ('background', ), ('industrial', ), ('light', ), ('&', ), ('magic', ), ('(ilm)', ), ('started', ), ('', ), ('filmmaker', ), ('george', ), ('lucas,', ), ('order', ), ('special', ), ('original', ), ('film.', ), ('since', ), ('then,', ), ('grown', ), ('powerhouse', ), ('contributed', ), ('just', ), ('entire', ), ('series,', ), ('also', ), ('films', ), ('forrest', ), ('gump,', ), ('jurassic', ), ('park,', ), ('who', ), ('framed', ), ('roger', ), ('rabbit,', ), ('raiders', ), ('lost', ), ('ark,', ), ('terminator', ), ('2.', ), ('won', ), ('numerous', ), ('academy', ), ('best', ), ('effects,', ), ('mention', ), ('string', ), ('clio', ), ('television', ), ('advertisements.while', ), ('much', ), ("ilm's", ), ('done', ), ('miniature', ), ('models', ), ('cameras,', ), ('long', ), ('been', ), ('bleeding', ), ('generated', ), ('effects.', ), ('division', ), ('dates', ), ('back', ), ('1979,', ), ('first', ), ('cg', ), ('', ), ('genesis', ), ('sequence', ), ('from', ), ('trek', ), ('ii:', ), ('wrath', ), ('khan.in', ), ('involved', ), ('creation', ), ('hardware', ), ('scanning,', ), ('rendering,', ), ('(the', ), ('joining', ), ('rendered', ), ('scanned', ), ('images', ), ('together).', ), ('some', ), ('these', ), ('systems', ), ('significant', ), ('advances', ), ('areas', ), ('such', ), ('morphing', ), ('simulating', ), ('muscles', ), ('hair.naturally,', ), ('time', ), ('went', ), ('innovations', ), ('realm,', ), ("company's", ), ('position', ), ('cutting', ), ('technology', ), ('continues', ), ('rely', ), ('ever-changing', ), ('combination', ), ('technologies', ), ('products.today,', ), ('runs', ), ('batch', ), ('processing', ), ('environment', ), ('capable', ), ('rendering', ), ('tens', ), ('picture', ), ('frames', ), ('day.', ), ('machines', ), ('running', ), ('linux,', ), ('irix,', ), ('compaq', ), ('tru64,', ), ('os', ), ('x,', ), ('solaris,', ), ('windows', ), ('join', ), ('together', ), ('provide', ), ('used', ), ('approximately', ), ('eight', ), ('hundred', ), ('users', ), ('daily,', ), ('whom', ), ('write', ), ('modify', ), ('code', ), ('controls', ), ('every', ), ('step', ), ('process.', ), ('context,', ), ('hundreds', ), ('components', ), ('are', ), ('combined', ), ('frame', ), ('computer-generated', ), ('enhanced', ), ('film.making', ), ('all', ), ('work,', ), ('keeping', ), ('working,', ), ('requires', ), ('certain', ), ('degree', ), ('technical', ), ('wizardry,', ), ('well', ), ('tool', ), ('set', ), ('up', ), ('task', ), ('integrating', ), ('frequently', ), ('changing', ), ('systems.enter', ), ('pythonback', ), ('1996,', ), ('', ), ('dalmation', ), ('exclusively', ), ('sgi', ), ('irix', ), ('shop,', ), ('unix', ), ('shell', ), ('scripting.', ), ('time,', ), ('producing', ), ('15-30', ), ('shots', ), ('show,', ), ('typically', ), ('only', ), ('small', ), ('part', ), ('feature', ), ('length', ), ('film', ), ('which', ), ('they', ), ('were', ), ('contributing.', )] Process finished with exit code

Python3.7 练习题(二) 使用Python进行文本词频统计的更多相关文章

  1. 用Python实现一个词频统计(词云+图)

    第一步:首先需要安装工具python 第二步:在电脑cmd后台下载安装如下工具: (有一些是安装好python电脑自带有哦) 有一些会出现一种情况就是安装不了词云展示库 有下面解决方法,需看请复制链接 ...

  2. Python字典使用--词频统计的GUI实现

    字典是针对非序列集合而提供的一种数据类型,字典中的数据是无序排列的. 字典的操作 为字典增加一项 dict[key] = value students = {"Z004":&quo ...

  3. 利用python实现简单词频统计、构建词云

    1.利用jieba分词,排除停用词stopword之后,对文章中的词进行词频统计,并用matplotlib进行直方图展示 # coding: utf-8 import codecs import ma ...

  4. Python3.7 练习题(-) 如何使用Python生成200个优惠卷(激活码)

    # 如何使用Python生成200个优惠卷(激活码) import random import string # string.ascii_letters 26个大小写 # -9数字 # 获得激活码中 ...

  5. python3笔记(二)Python语言基础

    缩进 要求严格的代码缩进是python语法的一大特色,就像C语言家族(C.C++.Java等等)中的花括号一样重要,在大多数场合还非常有必要.在很多代码规范里面也都有要求代码书写按照一定的规则进行换行 ...

  6. Python3入门(二)——Python开发工具Pycharm安装与配置

    一.概述 与IDEA同一家——Jetbrains出品的IDE,强大之处不再赘述 二.安装 点击下载一个合适的版本 参考网友的激活方式激活:https://blog.csdn.net/u01404481 ...

  7. 零基础学python-3.7 还有一个程序 python读写文本

    今天我们引入另外一个程序,文件的读写 我们先把简单的程序代码贴上.然后通过我们多次的改进.希望最后可以变成一个简单的文本编辑器 以下是我们最简单的代码: 'crudfile--读写文件' def re ...

  8. python3笔记十二:python数据类型-Dictionary字典

    一:学习内容 字典概念 字典创建 字典访问 字典添加 字典删除 字典遍历 字典与列表比较 二:字典概念 1.使用键值对(key-value)存储,具有极快的查找速度 2.注意:字典是无序的 3.特性: ...

  9. jieba和文本词频统计

    ---恢复内容开始--- 一.结巴中文分词涉及到的算法包括: (1) 基于Trie树结构实现高效的词图扫描,生成句子中汉字所有可能成词情况所构成的有向无环图(DAG): (2) 采用了动态规划查找最大 ...

随机推荐

  1. js promise中如何取到[[PromiseValue]]

    返回的值Promise {[[PromiseStatus]]: "resolved", [[PromiseValue]]: "http://dl.stream.qqmus ...

  2. 微信内嵌浏览器打开手机浏览器下载APP(APK)的方法

    想必大家会经常碰到网页链接在微信内无法打开和微信内无法打开app下载页的情况.通常这种情况微信会给个提示 “已停止访问该网址” ,那么导致这个情况的因素有哪些呢,主要有以下四点 1.网页链接被举报次数 ...

  3. mybatis一简单one2one关系xml配置

    user类 <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE mapper PUBLIC &qu ...

  4. 刷seed有感

    今天又把seed刷了一遍 昨天去了基佬他们公司.第一次去他们公司.米虫科技,在重庆算是一家中型公司吧. 他去公司加班写一个游戏的封面,第一次感觉ui的不给设计图真的很坑.一个页面所有东西 自己凭感觉写 ...

  5. yii防止延迟用户多次点击按钮重复提交数据

    是不是被用户的行为所困扰? 一. 一个表单用户点击提交按钮了N次,这也导致了数据提交了N次. 为了此受到了测试的欺辱,受到了老板的批评? 不用怕,它就是来拯救你的. 第一步:打开命令行,敲入 comp ...

  6. sql:按年、月、日钻取时间

    #按月排SELECT count(EN_NAME), DATE_FORMAT( CREATE_DATE, "%Y-%m" )FROM financeWHERE DATE_FORMA ...

  7. from jobscrawler_qianchengwuyou.items import JobscrawlerQianchengwuyouItem

    -- coding: utf-8 -- import scrapy from jobscrawler_qianchengwuyou.items import JobscrawlerQianchengw ...

  8. Linux 下编程

    关于Linux 下的C语言编译命令和编程要点! https://www.cnblogs.com/wfwenchao/p/3985153.html?utm_source=tuicool&utm_ ...

  9. code-Behind 技术

    就是代码隐藏,在ASP.NET 中通过ASPX 页面指向CS 文件的方法实现显示逻辑和处理逻辑的分离,这样有助于web 应用程序的创建.比如分工,美工和编程的可以个干各的,不用再像以前asp 那样都代 ...

  10. SSL backend error when using OpenSSL pycurl install error

    centos7 pip install pycurl 错误 pip uninstall pycurl export PYCURL_SSL_LIBRARY=nss pip install pycurl ...