Here is a program that reads a file and builds a histogram of the words in the file:

process_file loops through the lines of the file, passing them one at a time to process_line. The histogram h is being used as an accumulator. process_line uses the string method replace to replace hyphens with spaces before using split to break the line into a list of strings. It traverses the list of words and uses strip and lower to remove punctuation and convert to lower case. (It is a shorthand to say that strings are ‘converted;’ remember that string are immutable, so methods like strip and lower return new strings.)

Finally, process_line updates the histogram by creating a new item incrementing an existing one. To count the total number of words in the file, we can add up the frequencies in the histogram:

from Thinking in Python

Word histogram的更多相关文章

  1. {ICIP2014}{收录论文列表}

    This article come from HEREARS-L1: Learning Tuesday 10:30–12:30; Oral Session; Room: Leonard de Vinc ...

  2. Create and format Word documents using R software and Reporters package

    http://www.sthda.com/english/wiki/create-and-format-word-documents-using-r-software-and-reporters-pa ...

  3. 84. Largest Rectangle in Histogram *HARD* -- 柱状图求最大面积 85. Maximal Rectangle *HARD* -- 求01矩阵中的最大矩形

    1. Given n non-negative integers representing the histogram's bar height where the width of each bar ...

  4. Bag of word based image retrieval

    主要参考维基百科Bag of Word 在DLP领域里,bow(bag of word)是一个稀疏的向量,向量的每个元素记录词的出现次数,相当于对每篇文章都关于词典做词的直方图统计.同样的道理用在co ...

  5. Word/Excel 在线预览

    前言 近日项目中做到一个功能,需要上传附件后能够在线预览.之前也没做过这类似的,于是乎就查找了相关资料,.net实现Office文件预览大概有这几种方式: ① 使用Microsoft的Office组件 ...

  6. C#中5步完成word文档打印的方法

    在日常工作中,我们可能常常需要打印各种文件资料,比如word文档.对于编程员,应用程序中文档的打印是一项非常重要的功能,也一直是一个非常复杂的工作.特别是提到Web打印,这的确会很棘手.一般如果要想选 ...

  7. C# 给word文档添加水印

    和PDF一样,在word中,水印也分为图片水印和文本水印,给文档添加图片水印可以使文档变得更为美观,更具有吸引力.文本水印则可以保护文档,提醒别人该文档是受版权保护的,不能随意抄袭.前面我分享了如何给 ...

  8. 获取打开的Word文档

    using Word = Microsoft.Office.Interop.Word; int _getApplicationErrorCount=0; bool _isMsOffice = true ...

  9. How to accept Track changes in Microsoft Word 2010?

    "Track changes" is wonderful and remarkable tool of Microsoft Word 2010. The feature allow ...

随机推荐

  1. Nginx系列(四)--工作原理

    上篇文章介绍了Nginx框架的设计之管理进程以及多个工作进程的设计.master进程用来管理通过fork子进程与子进程通信.子进程通过处理进程信号接到master的通信去处理请求. Nginx工作原理 ...

  2. hdu 1722 Cake 数学yy

    题链:http://acm.hdu.edu.cn/showproblem.php? pid=1722 Cake Time Limit: 1000/1000 MS (Java/Others)    Me ...

  3. hdu 4882 ZCC Loves Codefires(贪心)

    # include<stdio.h> # include <algorithm> # include <string.h> using namespace std; ...

  4. java发送邮件带附件

    package com.smtp; import java.util.Vector; public class MailBean { private String to; // 收件人 private ...

  5. Get Client IP

    How to get a user's client IP address in ASP.NET? Often you will want to know the IP address of some ...

  6. thinkphp项目上传到github,为什么缺少很多文件

    thinkphp项目上传到github,为什么缺少很多文件 问题: 把tp5项目push到码云(类似github)上,为什么没有thinkphp这个核心库? 然后我看了下码云和github上,官方的t ...

  7. angular4(1)angular脚手架

    angular2之后有了类似于vue-cli的脚手架工具,很方便的帮助我们搭建项目: 1.安装angular命令行工具:npm install @angular/cli -g 2.检测angular- ...

  8. 关于hexo markdown添加的图片在github page中无法显示的问题

    title: 关于hexo markdown添加的图片在github page中无法显示的问题 date: 2018-03-31 00:21:18 categories: methods tags: ...

  9. Android setImageResource与setImageBitmap的区别

    同样的布局文件,小分辨率手机: 1.使用setImageBitmap设置时,出现如下现象: 2.使用setImageResource时,图片显示正常 原因:setImageResource(id)会根 ...

  10. LeetCode 190. Reverse Bits (算32次即可)

    题目: 190. Reverse Bits Reverse bits of a given 32 bits unsigned integer. For example, given input 432 ...