【python】入门学习（十）

#入门学习系列的内容均是在学习《Python编程入门（第3版）》时的学习笔记

统计一个文本文档的信息，并输出出现频率最高的10个单词

#text.py

#保留的字符

keep = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'

        'q','r','s','t','u','v','w','x','y','z',' ','-',"'"}

#将文本规范化

def normalize(s):

    """Convert s to a normalized string."""

    result = ''

    for c in s.lower():

        if c in keep:

            result += c

    return result

#获取文本基本信息

def file_stats(fname):

    """Print statistics for the given file."""

    s = open(fname,'r').read()

    num_chars = len(s)

    num_lines = s.count('\n')

    num_words = len(normalize(s).split())

    print("The file %s has:" % fname)

    print("  %s characters" % num_chars)

    print("  %s lines" % num_lines)

    print("  %s words" % num_words)

#将字符串转化为字典

def make_freq_dict(s):

    """Return a dictionary whose keys are the words of s,and whose values are the counts of those words."""

    s = normalize(s)

    words = s.split()

    d = {}

    for w in words:

        if w in d:

            d[w] += 1

        else:

            d[w] = 1

    return d

#获取文本基本信息

def file_stats2(fname):

    """Print statistics for the given file."""

    s = open(fname,'r').read()

    num_chars = len(s)

    num_lines = s.count('\n')

    d = make_freq_dict(s)

    num_words = sum(d[w] for w in d)

    lst = [(d[w],w) for w in d]

    lst.sort()

    lst.reverse()

    print("The file %s has:" % fname)

    print("  %s characters" % num_chars)

    print("  %s lines" % num_lines)

    print("  %s words" % num_words)

    print("\nThe top 10 most frequent words are:")

    i = 1

    for count,word in lst[:99]:

        print('%2s. %4s %s' % (i, count, word))

        i += 1

>>> file_stats2('a.txt')

The file a.txt has:

  12927 characters

  297 lines

  1645 words

The top 10 most frequent words are:

 1.   62 to

 2.   62 the

 3.   47 is

 4.   42 a

 5.   41 of

 6.   40 it

 7.   36 that

 8.   35 and

 9.   32 as

10.   24 so

进一步完善的代码：

#text.py

#保留的字符

keep = {'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'

        'q','r','s','t','u','v','w','x','y','z',' ','-',"'"}

#将文本规范化

def normalize(s):

    """Convert s to a normalized string."""

    result = ''

    for c in s.lower():

        if c in keep:

            result += c

    return result

#获取文本基本信息

def file_stats(fname):

    """Print statistics for the given file."""

    s = open(fname,'r').read()

    num_chars = len(s)

    num_lines = s.count('\n')

    num_words = len(normalize(s).split())

    print("The file %s has:" % fname)

    print("  %s characters" % num_chars)

    print("  %s lines" % num_lines)

    print("  %s words" % num_words)

#将字符串转化为字典

def make_freq_dict(s):

    """Return a dictionary whose keys are the words of s,and whose values are the counts of those words."""

    s = normalize(s)

    words = s.split()

    d = {}

    for w in words:

        if w in d:

            d[w] += 1

        else:

            d[w] = 1

    return d

#获取文本基本信息

def file_stats2(fname):

    """Print statistics for the given file."""

    s = open(fname,'r').read()

    num_chars = len(s)

    num_lines = s.count('\n')

    d = make_freq_dict(s)

    num_different_words = sum(d[w]/d[w] for w in d)

    num_words = sum(d[w] for w in d)

    words_average_length = sum(len(w) for w in d)/num_different_words

    num_once = sum(d[w] for w in d if d[w] == 1)

    lst = [(d[w],w) for w in d]

    lst.sort()

    lst.reverse()

    print("The file %s has:" % fname)

    print("  %s characters" % num_chars)

    print("  %s lines" % num_lines)

    print("  %s words" % num_words)

    print("  %s words appreance one time" % num_once)

    print("  %s different words" % int(num_different_words))

    print("  %s average length" % words_average_length)

    print("\nThe top 10 most frequent words are:")

    i = 1

    for count,word in lst[:10]:

        print('%2s. %4s %s' % (i, count, word))

        i += 1

def main():

    file_stats2('a.txt')

if __name__=='__main__':

    main()

>>> ================================ RESTART ================================

>>>

The file a.txt has:

  12927 characters

  297 lines

  1645 words

  515 words appreance one time

  699 different words

  6.539341917024321 average length

The top 10 most frequent words are:

 1.   62 to

 2.   62 the

 3.   47 is

 4.   42 a

 5.   41 of

 6.   40 it

 7.   36 that

 8.   35 and

 9.   32 as

10.   24 so

【python】入门学习（十）的更多相关文章

python入门学习：9.文件和异常
python入门学习:9.文件和异常关键点:文件.异常 9.1 从文件中读取数据9.2 写入文件9.3 异常9.4 存储数据 9.1 从文件中读取数据 9.1.1 读取整个文件首先创建一个pi_ ...
python入门学习：8.类
python入门学习:8.类关键点:类 8.1 创建和使用类8.2 使用类和实例8.3 继承8.4 导入类 8.1 创建和使用类面向对象编程是最有效的软件编写方法之一.在面向对象编程中,你编写 ...
python入门学习：7.函数
python入门学习:7.函数关键点:函数 7.1 定义函数7.2 传递实参7.3 返回值7.4 传递列表7.5 传递任意数量的实参7.6 将函数存储在模块中 7.1 定义函数使用关键字def ...
python入门学习：6.用户输入和while循环
python入门学习:6.用户输入和while循环关键点:输入.while循环 6.1 函数input()工作原理6.2 while循环简介6.3 使用while循环处理字典和列表 6.1 函数in ...
python入门学习：5.字典
python入门学习:5.字典关键点:字典 5.1 使用字典5.2 遍历字典5.3 嵌套 5.1 使用字典在python中字典是一系列键-值对.每个键都和一个值关联,你可以使用键来访问与之相关 ...
python入门学习：4.if语句
python入门学习:4.if语句关键点:判断 4.1 一个简单的测试4.2 条件测试4.3 if语句 4.1 一个简单的测试 if语句基本格式如下,注意不要漏了冒号 1if 条件 :2 ...
python入门学习：3.操作列表
python入门学习:3.操作列表关键点:列表 3.1 遍历整个列表3.2 创建数值列表3.3 使用列表3.4 元组 3.1 遍历整个列表循环这种概念很重要,因为它是计算机自动完成重复工作的常 ...
python入门学习：2.列表简介
python入门学习:2.列表简介关键点:列表 2.1 列表是什么2.2 修改.添加和删除元素2.3 组织列表 2.1 列表是什么列表,是由一系列按特定顺序排列的元素组成.你可以创建包含字母表 ...
Python入门学习：1.变量和简单的数据类型
python入门学习:1.变量和简单的数据类型关键点:变量.字符串.数字 1.1 变量的命名和使用1.2 字符串1.3 数字1.4 注释 1.1 变量的命名和使用变量,顾名思义是一个可变的量, ...
Python入门学习之路，怎么 “开心，高效，踏实” 地把Python学好？兴趣，兴趣，兴趣！
Python入门学习之路,怎么 “开心,高效,踏实” 地把Python学好?兴趣,兴趣,兴趣!找到你自己感兴趣的点进行切入,并找到兴趣点进行自我驱动是最好的学习方式! 推荐两本书,一本作为 ...

随机推荐

[歪谈]拽一个贵人出来给你"当炮架子"
我们在古装神话剧中经常会听到某个“先知”对前来算命的人说:你会在某某时刻遇到你的贵人.而这个贵人会在事业上助你一臂之力. 这里有个问题:贵人到底是什么?我们怎样去寻找我们的贵人. 前几天有个网友来咨询 ...
低版本IE浏览器 input元素出现叉叉的情况
都说是IE10之上的浏览器才有这个问题,恰巧我IE10之上都没有问题,反而是低版本的浏览器出现了这个问题.作为一个凭证,我先放一张图片在这里面. 之前无意中解决过这个问题,如今复现确实是没有解决,网上 ...
C#中的那些全局异常捕获
1.WPF全局捕获异常 public partial class App : Application { public App() { // 在异 ...
SQL Server数据库邮件配置
一.数据库邮件介绍数据库邮件是从SQL Server数据库引擎中发送电子邮件的企业解决方案,通过使用数据库邮件,数据库应用程序可以向用户发送电子邮件.邮件中可以包含查询结果,还可以包含来自网络中任何 ...
java 练手 Fibonacci数
Problem B Fibonacci数时间限制:3000 ms | 内存限制:65535 KB 描述无穷数列1,1,2,3,5,8,13,21,34,55...称为Fibonacci数列 ...
[福利]非认证公众帐号也能申请微信连Wi-Fi了
年初3月份时,拥有线下经营场所且开通微信认证的公众号可以开通微信连Wi-Fi接入,现在微信团队进一步开放了权限,非认证公众帐号也能申请微信连Wi-Fi了. 微信连Wi-Fi团队宣布,降低微信连Wi-F ...
Android高仿微信（一）——如何消除启动时的白屏
默认情况下,APP启动时会先把屏幕刷成白色,然后才绘制第一个Activity中的View,这两个步骤之间的延迟会造成启动后先看到白屏(时间大概为1秒左右).时间不长,但是我们也看到,一般的APP时不存 ...
隐藏NavigationBar时的一个坑
http://www.jianshu.com/p/efb960fed457 - (void)viewWillAppear:(BOOL)animated { [super viewWillAppear: ...
asp.net的sql防注入和去除html标记的方法
一. // <summary> /// 过滤标记 /// </summary> /// <param name="NoHTML">包括HTML, ...
STM32通用定时器（转载）
STM32的定时器功能很强大,学习起来也很费劲儿. 其实手册讲的还是挺全面的,只是无奈TIMER的功能太复杂,所以显得手册很难懂,我就是通过这样看手册:while(!SUCCESS){看手册-}才搞明 ...

【python】入门学习（十）

【python】入门学习（十）的更多相关文章

随机推荐

热门专题