Collections库使用

Date: 2019-05-27

Author: Sun

Collections库

Python拥有一些内置的数据类型，比如str, int, list, tuple, dict等， collections模块在这些内置数据类型的基础上，提供了几个额外的数据类型：

namedtuple(): 生成可以使用名字来访问元素内容的tuple子类

deque: 双端队列，可以快速的从另外一侧追加和推出对象

Counter: 计数器，主要用来计数

OrderedDict: 有序字典

defaultdict: 带有默认值的字典

1. namedtuple

namedtuple主要用来产生可以使用名称来访问元素的数据对象，通常用来增强代码的可读性

创建类实例

比如我们用户拥有一个这样的数据结构，每一个对象是拥有三个元素的tuple。

使用namedtuple方法就可以方便的通过tuple来生成可读性更高也更好用的数据结构。

# -*- coding: utf-8 -*-

__author__ = 'sun'

__date__ = '2019/5/27 0:19'

from collections import namedtuple

website_list = [

    ('Sohu', 'http://www.google.com/', u'张朝阳'),

    ('Sina', 'http://www.sina.com.cn/', u'王志东'),

    ('163', 'http://www.163.com/', u'丁磊')

]

Website = namedtuple('Website', ['name', 'url', 'founder'])

for website in website_list:

    website = Website._make(website)

    print(website)

2. deque

deque其实是双向队列的缩写，翻译过来就是双端队列，它最大的好处就是实现了从队列头部快速增加和取出对象: .popleft(), .appendleft() 。

作为一个双端队列，deque还提供了一些其他的好用方法，比如 rotate 等

raw = [1,2,3]

d = collections.deque(raw)

print(d)                    #结果deque([1, 2, 3])

#右增

d.append(4)

print(d)                    #结果deque([1, 2, 3, 4])

#左增

d.appendleft(0)

print(d)                    #结果deque([0, 1, 2, 3, 4])

#左扩展

d.extend([5,6,7])

print(d)                    #结果deque([0, 1, 2, 3, 4, 5, 6, 7])

#右扩展

d.extendleft([-3,-2,-1])

print(d)                    #结果deque([-1, -2, -3, 0, 1, 2, 3, 4, 5, 6, 7])

#右弹出

r_pop = d.pop()

print(r_pop)                #结果7

print(d)                    #结果deque([-1, -2, -3, 0, 1, 2, 3, 4, 5, 6])

#左弹出

l_pop = d.popleft()

print(l_pop)                #结果-1

print(d)                    #结果deque([-2, -3, 0, 1, 2, 3, 4, 5, 6])

#将右边n个元素值取出加入到左边

print(d)                    #原队列deque([-2, -3, 0, 1, 2, 3, 4, 5, 6])

d.rotate(3)

print(d)                    #rotate以后为deque([4, 5, 6, -2, -3, 0, 1, 2, 3])

案例：一个无尽循环的跑马灯

# -*- coding: utf-8 -*-

__author__ = 'sun'

__date__ = '2019/5/27 0:19'

import sys

import time

from collections import deque

fancy_loading = deque('>--------------------')

while True:

    print('\r%s' % ''.join(fancy_loading))

    fancy_loading.rotate(1)

    sys.stdout.flush()

    time.sleep(0.08)

3. Counter

Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型，以字典的键值对形式存储，其中元素作为key，其计数作为value。计数值可以是任意的Interger（包括0和负数）

from collections import Counter

>>> c = Counter("abcdcba")

>>> c

Counter({'a': 2, 'c': 2, 'b': 2, 'd': 1})

>>> c["b"] = 0

>>> c

Counter({'a': 2, 'c': 2, 'd': 1, 'b': 0})

>>> del c["a"]

>>> c

Counter({'c': 2, 'b': 2, 'd': 1})

常见支持方法：

（1）elements()

elements就是将其中的key值乘以出现次数全部打印出来

from collections import Counter

c = Counter(cats=4, dogs=8, mouse=2)

print(list(c.elements()))

#输出结果：['cats', 'cats', 'cats', 'cats', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs', 'dogs','mouse', 'mouse']

（2）most_common([n])

n为可选参数，如果不输入n的值，则默认返回所有，输入-1则返回空，输入小于最长长度，则返回前n个数，输入等于最长长度，则返回所有

# 获取出现频率最高的2个字符

print(c.most_common(2))

案例：使用Counter模块统计一段句子里面所有字符出现次数

# -*- coding: utf-8 -*-

__author__ = 'sun'

__date__ = '2019/5/27 0:19'

from collections import Counter

s = '''A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.'''.lower()

c = Counter(s)

# 获取出现频率最高的5个字符

print(c.most_common(5))

4 OrderedDict

在Python中，dict这个数据结构由于hash的特性，是无序的，这在有的时候会给我们带来一些麻烦，幸运的是，collections模块为我们提供了OrderedDict，当你要获得一个有序的字典对象时，用它就对了。

from collections import OrderedDict

d = OrderedDict(a=11, b=2, c=3)

print(d)    #会排序

5 defaultdict

在使用Python原生的数据结构dict的时候，如果用 d[key] 这样的方式访问，当指定的key不存在时，是会抛出KeyError异常的。

但是，如果使用defaultdict，只要你传入一个默认的工厂方法，那么请求一个不存在的key时，便会调用这个工厂方法使用其结果来作为这个key的默认值。

defaultdict很适合应用在一个key对应value list场合。

# -*- coding: utf-8 -*-

__author__ = 'sun'

__date__ = '2019/5/27 0:19'

from collections import defaultdict

members = [

    # Age, name

    ['male', 'John'],

    ['male', 'Jack'],

    ['female', 'Lily'],

    ['male', 'Pony'],

    ['female', 'Lucy'],

]

#{

#  ‘male’:['john', 'jack','pony']

#}

result = defaultdict(list)

for sex, name in members:

    result[sex].append(name)

print(result)

合并列表元素，并排序

from collections import defaultdict

s=[('yellow',1),('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)]

d=defaultdict(list)

for k, v in s:

    d[k].append(v)

a=sorted(d.items())

print(a)

defaultdict还可以被用来计数，将default_factory设为int即可

字符串中的字母第一次出现时，字典中没有该字母，default_factory函数调用int()为其提供一个默认值0,加法操作将计算出每个字母出现的次数。

from collections import defaultdict

s = 'i am an chinese man, i love china.'

d = defaultdict(int)

for k in s:

    d[k] += 1

print('\n',d)

a=sorted(d.items())

print('\n',a)