python内置模块之itertools

前言

itertools模块是python内置的迭代器模块，定义了可生成多种迭代器的函数，用来代替可迭代对象的遍历等操作，节约内存。

迭代器函数的类型

无限迭代器：包括count、cycle、repeat，用于生成无限序列的迭代器；
有限迭代器：接收一个或多个序列作为参数，进行组合、分组和过滤等；
组合生成器：多个序列的排列、组合等

无限迭代器

count

count本身是一个迭代器，实现了__iter__和__next__方法，用来生成无限的数字序列。

# 参数：

start=0：迭代的起点

step=1：步长

for i in count():

    print(i)

    if i == 1000:

        break

accumulate

accumulate对一个序列的数据进行积累。

# 参数

iterable：可迭代对象

func：指定规则

from itertools import accumulate

print(list(accumulate([1,2,3,4], func=lambda x,y:x + y))) # 这是默认的规则

# 结果

[1, 3, 6, 10]

cycle

迭代器，对迭代对象进行无限的重复迭代。

# 参数：

iterable:可迭代对象

from itertools import cycle

j = 0

for i in cycle(range(2)):

    j += 1

    print(i) # 结果：0 1 0 1 0 1

    if j == 6:

        break

repeat

repeat用于反复生成相同的对象，注意它们其实指向的是同一个对象，只是生成了众多的引用。

# 参数

p_object：任何对象；

times：生成的次数，默认是无限的；

from itertools import repeat

for i in repeat([1,2,3], times=2):

    print(i)

有限迭代器

迭代器拼接chain

chain可将多个迭代器拼接返回新的迭代器。

# 参数为众多的可迭代对象

from itertools import chain

x = list(chain(range(5),range(4)))

print(x) # [0, 1, 2, 3, 4, 0, 1, 2, 3]

数据筛选compress

compress 可用于对数据进行筛选，当 selectors 的某个元素为 true 时，则保留 data 对应位置的元素，否则去除，data位置不够默认为false。

# 参数

data：待筛选数据

selectors：筛选规则

from itertools import compress

x = list(compress({'name':'','age':''}, [0,1]))

print(x) # ['age']

数据分组groupby

groupby迭代器对可迭代对象的元素按指定的规则分组。

# 参数

data：需要分组的可迭代对象

key：分组规则

from itertools import groupby

x = groupby(['name','age'], key=len)

print([(x, list(y)) for x, y in x]) # [(4, ['name']), (3, ['age'])]

数据过滤

dropwhile：

# 参数

predicate:过滤函数

iterable:迭代对象

from itertools import dropwhile

print(list(dropwhile(lambda x:x<1,[False,False,1,2,3]))) # [1, 2, 3]

对于 iterable 中的元素，如果 predicate(item) 为 true，则丢弃该元素，否则返回该项及所有后续项；如果所有的元素都返回False,那么全部保留。

takewhile

# 参数

predicate:过滤函数

iterable:迭代对象

takewhile和dropwhile迭代器刚好相反，对于 iterable 中的元素，如果 predicate(item) 为 true，则保留该元素，只要 predicate(item) 为 false，则立即停止迭代，返回被迭代过的保留的元素。

filterfalse

from itertools import filterfalse

print(list(filterfalse(None, ['', 1, []])))

filterfalse的过滤条件和内置函数filter刚好相反，保留返回false而丢弃返回True的元素。

ifilter:在py2中存在，py3已去除，使用内置函数filter替代。
starmap

starmap的功能和map函数非常相似，不过starmap的可迭代对象参数必须是元组序列，即它接受多个参数；

from itertools import starmap

print(list(starmap(lambda x:x, [[1,],[3,]])))

数据切片

islice

# 参数

iterable:可迭代对象，支持字典和集合的切片，但返回的结果未知。

start: 起点

stop:终点

step:步长

from itertools import islice

print(list(islice([5,4,3,4,5,6],1,3,1)))

迭代器重复创建tee

from itertools import tee, takewhile

for t in tee([1,2,3], 4):

    print(t)

    print(id(t))

tee创建的多个迭代器都是相互独立的对象，默认创建两个独立的迭代器。

打包迭代器

izip:py2中存在，和py3的内置函数zip的功能相同。
zip_longest

# 参数

*iter：多个可迭代对象

fillvalue：缺失值的默认

from itertools import zip_longest

for z in zip_longest([1,2,3],[1,2],fillvalue=10):

    print(z)

zip_longest和zip的功能相似，但是zip_longest会将所有的元素迭代完成，不够的用默认值替代。

组合生成器

组合生成器用来求多个序列的排列组合。

product:求多个可迭代对象的笛卡尔积；

print(list(product('ABC', '123'))

# 结果

[('A', '1'), ('A', '2'), ('A', '3'), ('B', '1'), ('B', '2'), ('B', '3'), ('C', '1'), ('C', '2'), ('C', '3')]

# 其相当于：

def new_product():

    res = []

    for a in 'ABC':

        for b in '123':

            res.append((a,b))

    return iter(res)

# 显然product的效率要高得多

permutations：生成元素不可重复的排列；

from itertools import permutations

print(list(permutations('ABC',r=2)))

# 结果

[('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

# 相当于

def new_permutations():

    x = 'ABC'

    res = []

    for i, y in enumerate(x):

        for z in x[:]:

            if y ==z:

                continue

            res.append((y,z))

    return iter(res)

combinations：生成元素不可重复的组合；

print(list(combinations('ABC',r=2)))

# 结果

[('A', 'B'), ('A', 'C'), ('B', 'C')]

# 相当于

def new_combinations():

    x = 'ABC'

    res = []

    for i, y in enumerate(x):

        for z in x[i + 1:]:

            res.append((y,z))

    return iter(res)

combinations_with_replacement：生成元素可重复的组合；

from itertools import combinations_with_replacement

print(list(combinations_with_replacement('ABC',r=2)))

# 相当于

def new_combinations_with_replacement():

    x = 'ABC'

    res = []

    for i, y in enumerate(x):

        for z in x[i:]:

            res.append((y,z))

    return iter(res)

总结

一般来说，如果能够用迭代器的场景尽量使用迭代器，可以极大改善内存使用状况；
本模块比较常用的迭代器为：chain,dropwhile, zip_longest.

参考