Word histogram
Here is a program that reads a file and builds a histogram of the words in the file:

process_file loops through the lines of the file, passing them one at a time to process_line. The histogram h is being used as an accumulator. process_line uses the string method replace to replace hyphens with spaces before using split to break the line into a list of strings. It traverses the list of words and uses strip and lower to remove punctuation and convert to lower case. (It is a shorthand to say that strings are ‘converted;’ remember that string are immutable, so methods like strip and lower return new strings.)
Finally, process_line updates the histogram by creating a new item incrementing an existing one. To count the total number of words in the file, we can add up the frequencies in the histogram:

from Thinking in Python
Word histogram的更多相关文章
- {ICIP2014}{收录论文列表}
This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinc ...
- Create and format Word documents using R software and Reporters package
http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...
- 84. Largest Rectangle in Histogram *HARD* -- 柱状图求最大面积 85. Maximal Rectangle *HARD* -- 求01矩阵中的最大矩形
1. Given n non-negative integers representing the histogram's bar height where the width of each bar ...
- Bag of word based image retrieval
主要参考维基百科Bag of Word 在DLP领域里,bow(bag of word)是一个稀疏的向量,向量的每个元素记录词的出现次数,相当于对每篇文章都关于词典做词的直方图统计.同样的道理用在co ...
- Word/Excel 在线预览
前言 近日项目中做到一个功能,需要上传附件后能够在线预览.之前也没做过这类似的,于是乎就查找了相关资料,.net实现Office文件预览大概有这几种方式: ① 使用Microsoft的Office组件 ...
- C#中5步完成word文档打印的方法
在日常工作中,我们可能常常需要打印各种文件资料,比如word文档.对于编程员,应用程序中文档的打印是一项非常重要的功能,也一直是一个非常复杂的工作.特别是提到Web打印,这的确会很棘手.一般如果要想选 ...
- C# 给word文档添加水印
和PDF一样,在word中,水印也分为图片水印和文本水印,给文档添加图片水印可以使文档变得更为美观,更具有吸引力.文本水印则可以保护文档,提醒别人该文档是受版权保护的,不能随意抄袭.前面我分享了如何给 ...
- 获取打开的Word文档
using Word = Microsoft.Office.Interop.Word; int _getApplicationErrorCount=0; bool _isMsOffice = true ...
- How to accept Track changes in Microsoft Word 2010?
"Track changes" is wonderful and remarkable tool of Microsoft Word 2010. The feature allow ...
随机推荐
- Nginx系列(四)--工作原理
上篇文章介绍了Nginx框架的设计之管理进程以及多个工作进程的设计.master进程用来管理通过fork子进程与子进程通信.子进程通过处理进程信号接到master的通信去处理请求. Nginx工作原理 ...
- hdu 1722 Cake 数学yy
题链:http://acm.hdu.edu.cn/showproblem.php? pid=1722 Cake Time Limit: 1000/1000 MS (Java/Others) Me ...
- hdu 4882 ZCC Loves Codefires(贪心)
# include<stdio.h> # include <algorithm> # include <string.h> using namespace std; ...
- java发送邮件带附件
package com.smtp; import java.util.Vector; public class MailBean { private String to; // 收件人 private ...
- Get Client IP
How to get a user's client IP address in ASP.NET? Often you will want to know the IP address of some ...
- thinkphp项目上传到github,为什么缺少很多文件
thinkphp项目上传到github,为什么缺少很多文件 问题: 把tp5项目push到码云(类似github)上,为什么没有thinkphp这个核心库? 然后我看了下码云和github上,官方的t ...
- angular4(1)angular脚手架
angular2之后有了类似于vue-cli的脚手架工具,很方便的帮助我们搭建项目: 1.安装angular命令行工具:npm install @angular/cli -g 2.检测angular- ...
- 关于hexo markdown添加的图片在github page中无法显示的问题
title: 关于hexo markdown添加的图片在github page中无法显示的问题 date: 2018-03-31 00:21:18 categories: methods tags: ...
- Android setImageResource与setImageBitmap的区别
同样的布局文件,小分辨率手机: 1.使用setImageBitmap设置时,出现如下现象: 2.使用setImageResource时,图片显示正常 原因:setImageResource(id)会根 ...
- LeetCode 190. Reverse Bits (算32次即可)
题目: 190. Reverse Bits Reverse bits of a given 32 bits unsigned integer. For example, given input 432 ...