标准类库-数据类型之集合-容器数据类型

by:授客 QQ：1033553122

Counter对象

例子

>>> from collections import Counter

>>> cnt = Counter()

>>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']:

cnt[word] += 1 # 等同 cnt[word] = cnt[word] + 1

cnt[word] 每个list元素出现的次数，但是这里相加时会自动去掉重复统计

>>> cnt

Counter({'blue': 3, 'red': 2, 'green': 1})

test.txt内容如下

the and to of you a my hamlet in

the to of you a my hamlet in

the of you a my hamlet in

the you a my hamlet in

the a my hamlet in

the my hamlet in

the hamlet in

the in

good by 2016

shouke 2017

1099

study python Counter

>>> import re

>>> words = re.findall(r'\w+', open('d:\\test.txt').read().lower())

>>> print(words)

['the', 'and', 'to', 'of', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'and', 'to', 'of', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'and', 'to', 'of', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'to', 'of', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'of', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'you', 'a', 'my', 'hamlet', 'in', 'the', 'a', 'my', 'hamlet', 'in', 'the', 'my', 'hamlet', 'in', 'the', 'hamlet', 'in', 'the', 'in', 'in', 'good', 'by', '2016', 'shouke', '2017', '1099', 'study', 'python', 'counter']

>>> Counter(words).most_common(10)

[('in', 11), ('the', 10), ('hamlet', 9), ('my', 8), ('a', 7), ('you', 6), ('of', 5), ('to', 4), ('and', 3), ('2016', 1)]

说明：\w 只匹配字母和数字字符

类说明

class collections.Counter([iterable-or-mapping])

Counter，用于统计哈希对象的dict子类，是一个无序集合，把被统计元素存储为字典的键，而把对应元素出现的次数存储为字典的值。

可统计可迭代对象(iterable)、其它mapping对象、counter中的元素

>>> c = Counter() # 空的 counter

>>> c

Counter()

>>> c = Counter('gallahad') # 通过可迭代对象(字符串)创建的counter

>>> c

Counter({'a': 3, 'l': 2, 'd': 1, 'h': 1, 'g': 1})

>>> c = Counter([2, 3, 4, 3, 3, 4]) # 通过可迭代对象(List)创建的counter

>>> c

Counter({3: 3, 4: 2, 2: 1})

>>> c = Counter({'red':4, 'blue':2}) # 通过mapping对象(字典)创建的counter

>>> c

Counter({'red': 4, 'blue': 2})

>>> c = Counter(cats=4, dogs=8) # 通过关键字参数创建的counter

>>> c

Counter({'dogs': 8, 'cats': 4})

Counter对象拥有字典的接口，可通过字典方式，如counter[element]获取element的统计次数，如果key即element不存在，则返回0

>>> c = Counter(['eegs', 'ham'])

>>> c

Counter({'ham': 1, 'eegs': 1})

>>> c['bacon']

修改某个元素的统计次数，然后使用del移除该元素的统计

>>> c = Counter(['shouke', 'shouke', '2017'])

>>> c

Counter({'shouke': 2, '2017': 1})

>>> c['shouke'] = 1

>>> c

Counter({'shouke': 1, '2017': 1})

>>> del c['shouke']

>>> c

Counter({'2017': 1})

3.1中新增

对象方法

除了支持字典对象所拥的方法之外，Counter对象还支持以下三种方法

elements()

返回一个List，如果存在被统计元素，且元素统计次数大于0，假设为N，则该元素会在list中重复出现N次。List中元素未知顺序任意。

>>> c = Counter(a=4, b=2, c=0, d=-2)

>>> c.elements()

>>> list(c.elements())

['b', 'b', 'a', 'a', 'a', 'a']

注意：统计次数小于0的元素被忽略了。

most_common([n])

返回元素统计次数排名前N位的元素，如果不指定N，默认返回全部元素的统计。如果元素彼此的统计次数相等，则元素的顺序任意。

>>> Counter('abracadabra').most_common(3)

[('a', 5), ('b', 2), ('r', 2)]

subtract([iterable-or-mapping])

counter相减（对应元素的统计次数相减）

>>> c = Counter(a=4, b=2, c=0, d=-2)

>>> d = Counter(a=1, b=2, c=3, d=4)

>>> c.subtract(d)

>>> c

Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

>>> d = Counter(a=1, b=2, c=3, d=4, e=2)

>>> c.subtract(d)

>>> c

Counter({'a': 2, 'b': -2, 'e': -2, 'c': -6, 'd': -10})

说明：如上，如果后一个counter中的元素在前一个counter中未出现，则默认前一个counter中该元素的统计次数为0

3.2中新增

以下两个常规的字典方法，对于Counter对象，有点不一样

fromkeys(iterable)

该方法对于counter对象来说，未实现。

update([iterable-or-mapping])

根据提供的可迭代对象，或者映射对象，或者counter更新统计次数。可迭代对象，期望是元素序列，而(key,value)非键值对序列

>>> c = Counter(a=4, b=2, c=0, d=-2)

>>> c.update(['d', 'd', 'a', 'c', 'e'])

>>> c

Counter({'a': 5, 'b': 2, 'c': 1, 'e': 1, 'd': 0})

Counter对象常见使用模式

>>> c

Counter({'a': 5, 'b': 2, 'c': 1, 'e': 1, 'd': 0})

>>> sum(c.values()) # 获取元素统计次数之和

>>> c.clear() # 清空统计

>>> c

Counter()

>>> c = Counter(['shou', 'ke', '2014', '2017', '2017'])

>>> c

Counter({'2017': 2, '2014': 1, 'ke': 1, 'shou': 1})

>>> list(c) # 返回元素的list表示，元素唯一，不改变Counter对象

['2017', '2014', 'ke', 'shou']

>>> set(c) # 返回元素的set集合，不改变Counter对象

{'2017', '2014', 'ke', 'shou'}

>>> dict(c) # 返回包含elem:cnt(元素:统计次数)对的字典，不改变Counter对象

{'2017': 2, '2014': 1, 'ke': 1, 'shou': 1}

>>> c.items() # 所有统计项

dict_items([('2017', 2), ('2014', 1), ('ke', 1), ('shou', 1)])

# Counter(dict([(elem,cnt),(elem,cnt)])) # 从包含(elem,cnt)对的list创建counter对象

>>> Counter(dict([('shou',2),('ke',1)]))

Counter({'shou': 2, 'ke': 1})

# c.most_common()[:-n:-1],返回统计次数最少的前 n-1项

>>> c.most_common()[:-3:-1]

[('shou', 1), ('ke', 1)]

>>> c = Counter(dict([('shou', 1), ('ke',2), ('2014', 0),('2017',-1)]))

>>> +c # 返回移除了统计值为0、负数的统计项后的Counter

Counter({'ke': 2, 'shou': 1})

>>> c

Counter({'ke': 2, 'shou': 1, '2014': 0, '2017': -1})

>>>

数学运行

输出只会保留统计大于0的统计项

>>> c = Counter(a=3, b=1)

>>> d = Counter(a=1, b=2)

>>> c + d # c[x] + d[x]

Counter({'a': 4, 'b': 3})

>>> c -d # c[x] - d[x]

Counter({'a': 2}) #忽略了统计值为0的项

>>> c & d # min(c[x], d[x])

Counter({'b': 1, 'a': 1})

>>> c | d # max(c[x], d[x])

Counter({'a': 3, 'b': 2})

添加空的Counter、用空的counter减去当前Counter的简写方法

>>> c = Counter(a=2, b=-4)

>>> +c

Counter({'a': 2})

>>> -c

Counter({'b': 4})

>>>

3.3新增

New in version 3.3: Added support for unary plus, unary minus, and in-place multiset operations

应用举例

类似如下的一个列表，数据总量达100w，要提取其中重复的数据。

解决方案：逐行读取，把读取的数据保存在list中，然后如下，构造Counter对象，然后判断Counter对象中，元素统计值是否大于1，大于1则表明存在重复数据，print，（实践测试，实际处理耗时： 1s）

# 获取重复数据
c
= Counter(list_data)
for

item
in
c.items():
if

item[1]
>
1:
print('重复数据:%s'

% item[0])

查看更多类型介绍，烦参考官方文档

python 标准类库-数据类型之集合-容器数据类型的更多相关文章

Python 标准类库-数据类型之copy-深拷贝浅拷贝操作
标准类库-数据类型之copy-深拷贝浅拷贝操作 by:授客 QQ:1033553122 Python中赋值并不会拷贝对象,只是创建目标和对象的绑定关系. copy.copy(x) 返回x的浅拷贝 ...
python 标准类库-并行执行之subprocess-子进程管理
标准类库-并行执行之subprocess-子进程管理 by:授客QQ:1033553122 1．使用subprocess模块以下函数是调用子进程的推荐方法,所有使用场景它们都能处理.也可用Popen ...
Python 标准类库- 因特网协议于支持之UUID
标准类库- 因特网协议于支持之UUID by:授客 QQ:1033553122 测试环境 python3 UUID生成函数定义 uuid.getnode() 获取一个表示硬件地址的48位正整数.第 ...
Python 标准类库 - 因特网协议与支持之socketserver
标准类库 - 因特网协议与支持之socketserver by:授客 QQ:1033553122 socketserver 模块,简化网络服务编写任务. 创建服务的步骤 1 通过子类化BaseReq ...
Python 标准类库-Windows特殊服务之msvcrt
标准类库-Windows特殊服务之msvcrt by:授客 QQ:1033553122 广告:出售自研自动化小平台(无需编码也可用),有需要请联系测试环境 win7 64位 Python 3.4 ...
Python 标准类库-日期类型之datetime模块
标准类库-日期类型之datetime模块 by:授客 QQ:1033553122 可用类型 3 实践出真知 4 timedelta对象 4 class datetime.timedelta(da ...
Python 标准类库-数字和数学模块之decimal使用简介
标准类库-数字和数学模块之decimal使用简介 by:授客 QQ:1033553122 例子 >>>from decimal import * >>>getcon ...
Python3标准库：collections容器数据类型
1. collections容器数据类型 collections模块包含除内置类型list.dict和tuple以外的其他容器数据类型. 1.1 ChainMap搜索多个字典 ChainMap类管理一 ...
python基本数据类型之集合
python基本数据类型之集合集合是一种容器,用来存放不同元素. 集合有3大特点: 集合的元素必须是不可变类型(字符串.数字.元组): 集合中的元素不能重复: 集合是无序的. 在集合中直接存入lis ...

随机推荐

神经网络架构pytorch－MSELoss损失函数
MSELoss损失函数中文名字就是:均方损失函数,公式如下所示: 这里 loss, x, y 的维度是一样的,可以是向量或者矩阵,i 是下标. 很多的 loss 函数都有 size_average 和 ...
Linux 网络工具详解之 ip tuntap 和 tunctl 创建 tap/tun 设备
本文首发于我的公众号 Linux云计算网络(id: cloud_dev),专注于干货分享,号内有 10T 书籍和视频资源,后台回复「1024」即可领取,欢迎大家关注,二维码文末可以扫. 在前面一篇文章 ...
python --商品评价---- 数据表结构以及理解
商品评论(评价)功能 1.概述评论功能已经成为APP和网站开发中的必备功能.本文主要介绍评论功能的数据库设计. 评论功能最主要的是发表评论和回复评论(删除功能在后台).评论功能的拓展功能体现有以下几 ...
利用vi编辑器创建和编辑正文文件（一）
1. vim是vi的升级版本. 2. vi所UNIX和Linux系统内嵌的标准文编辑器,可执行,修改,复制,移动,粘贴和删除正文等命令,也可以进行移动光标,搜索字符和退出vi的 ...
BFC是个什么概念？
在布局中一般就三种定位方式:普通流.浮动.绝对定位. BFC,译过来叫作“块级格式化上下文”,听起来贼高大上,它属于普通流的一种.通俗一点来讲,可以把 BFC 理解为一个封闭的大箱子,箱子内部的元素无 ...
clion调试postgresql
clion怎么调试postgresql呢? clion使用cmake去编译项目的,但是大家编译postgresql用的是make.虽然项目中也有CMakeLists.txt文件,但是cmake会报错, ...
Mysql加锁过程详解（6）-数据库隔离级别（2）-通过例子理解事务的4种隔离级别
Mysql加锁过程详解(1)-基本知识 Mysql加锁过程详解(2)-关于mysql 幻读理解 Mysql加锁过程详解(3)-关于mysql 幻读理解 Mysql加锁过程详解(4)-select fo ...
SpringBoot(7) SpringBoot启动方式
第一种启动方式:对含有main方法的类进行 Run As Java Application 第二种方式:对项目“Maven Install” 生成jar包在target目录下(java -jar ...
分享几个实用的Chrome扩展程序
前言吐槽一下自己,最近变懒了,博客已经变成月更了.这次分享几个自己工作这几年下来,平常用的比较多的几个谷歌浏览器的扩展程序. AdBlock 最受欢迎的 Chrome 扩展,拥有超过 6000 万用 ...
监控MySQL组复制
使用 Perfomance Schema 中的表来监控组复制,假定你的MySQL编译时已经启动了 Performance Schema 表.组复制将添加如下两张 P_S 表: performance_ ...

python 标准类库-数据类型之集合-容器数据类型

例子

类说明

对象方法

Counter对象常见使用模式

数学运行

应用举例

python 标准类库-数据类型之集合-容器数据类型的更多相关文章

随机推荐

热门专题