Reprinted from:

Iterables vs. Iterators vs. Generators

Occasionally I've run into situations of confusion on the exact differences between the following related concepts in Python:

  • a container
  • an iterable
  • an iterator
  • a generator
  • a generator expression
  • a {list, set, dict} comprehension

I'm writing this post as a pocket reference for later.

Containers

Containers are data structures holding elements, and that support membership tests. They are data structures that live in memory, and typically hold all their values in memory, too. In Python, some well known examples are:

  • list, deque, …
  • set, frozensets, …
  • dict, defaultdict, OrderedDict, Counter, …
  • tuple, namedtuple, …
  • str

Containers are easy to grasp, because you can think of them as real life containers: a box, a cubboard, a house, a ship, etc.

Technically, an object is a container when it can be asked whether it contains a certain element. You can perform such membership tests on lists, sets, or tuples alike:

>>> assert 1 in [1, 2, 3]      # lists
>>> assert 4 not in [1, 2, 3]
>>> assert 1 in {1, 2, 3} # sets
>>> assert 4 not in {1, 2, 3}
>>> assert 1 in (1, 2, 3) # tuples
>>> assert 4 not in (1, 2, 3)

Dict membership will check the keys:

>>> d = {1: 'foo', 2: 'bar', 3: 'qux'}
>>> assert 1 in d
>>> assert 4 not in d
>>> assert 'foo' not in d # 'foo' is not a _key_ in the dict

Finally you can ask a string if it "contains" a substring:

>>> s = 'foobar'
>>> assert 'b' in s
>>> assert 'x' not in s
>>> assert 'foo' in s # a string "contains" all its substrings

The last example is a bit strange, but it shows how the container interface renders the object opaque. A string does not literally store copies of all of its substrings in memory, but you can certainly use it that way.

NOTE:
Even though most containers provide a way to produce every element they contain, that ability does not make them a container but an iterable (we'll get there in a minute).

Not all containers are necessarily iterable. An example of this is a Bloom filter. Probabilistic data structures like this can be asked whether they contain a certain element, but they are unable to return their individual elements.

Iterables

As said, most containers are also iterable. But many more things are iterable as well. Examples are open files, open sockets, etc. Where containers are typically finite, an iterable may just as well represent an infinite source of data.

An iterable is any object, not necessarily a data structure, that can return an iterator (with the purpose of returning all of its elements). That sounds a bit awkward, but there is an important difference between an iterable and an iterator. Take a look at this example:

>>> x = [1, 2, 3]
>>> y = iter(x)
>>> z = iter(x)
>>> next(y)
1
>>> next(y)
2
>>> next(z)
1
>>> type(x)
<class 'list'>
>>> type(y)
<class 'list_iterator'>

Here, x is the iterable, while y and z are two individual instances of an iterator, producing values from the iterable x. Both y and z hold state, as you can see from the example. In this example, x is a data structure (a list), but that is not a requirement.

NOTE:
Often, for pragmatic reasons, iterable classes will implement both __iter__()and __next__() in the same class, and have __iter__() return self, which makes the class both an iterable and its own iterator. It is perfectly fine to return a different object as the iterator, though.

Finally, when you write:

x = [1, 2, 3]
for elem in x:
...

This is what actually happens:

When you disassemble this Python code, you can see the explicit call to GET_ITER, which is essentially like invoking iter(x). The FOR_ITER is an instruction that will do the equivalent of calling next() repeatedly to get every element, but this does not show from the byte code instructions because it's optimized for speed in the interpreter.

>>> import dis
>>> x = [1, 2, 3]
>>> dis.dis('for _ in x: pass')
1 0 SETUP_LOOP 14 (to 17)
3 LOAD_NAME 0 (x)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_NAME 1 (_)
13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE

Iterators

So what is an iterator then? It's a stateful helper object that will produce the next value when you call next() on it. Any object that has a __next__() method is therefore an iterator. How it produces a value is irrelevant.

So an iterator is a value factory. Each time you ask it for "the next" value, it knows how to compute it because it holds internal state.

There are countless examples of iterators. All of the itertools functions return iterators. Some produce infinite sequences:

>>> from itertools import count
>>> counter = count(start=13)
>>> next(counter)
13
>>> next(counter)
14

Some produce infinite sequences from finite sequences:

>>> from itertools import cycle
>>> colors = cycle(['red', 'white', 'blue'])
>>> next(colors)
'red'
>>> next(colors)
'white'
>>> next(colors)
'blue'
>>> next(colors)
'red'

Some produce finite sequences from infinite sequences:

>>> from itertools import islice
>>> colors = cycle(['red', 'white', 'blue']) # infinite
>>> limited = islice(colors, 0, 4) # finite
>>> for x in limited: # so safe to use for-loop on
... print(x)
red
white
blue
red

To get a better sense of the internals of an iterator, let's build an iterator producing the Fibonacci numbers:

>>> class fib:
... def __init__(self):
... self.prev = 0
... self.curr = 1
...
... def __iter__(self):
... return self
...
... def __next__(self):
... value = self.curr
... self.curr += self.prev
... self.prev = value
... return value
...
>>> f = fib()
>>> list(islice(f, 0, 10))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Note that this class is both an iterable (because it sports an __iter__() method), and its own iterator (because it has a __next__() method).

The state inside this iterator is fully kept inside the prev and curr instance variables, and are used for subsequent calls to the iterator. Every call to next() does two important things:

  1. Modify its state for the next next() call;
  2. Produce the result for the current call.

Central idea: a lazy factory
From the outside, the iterator is like a lazy factory that is idle until you ask it for a value, which is when it starts to buzz and produce a single value, after which it turns idle again.

Generators

Finally, we've arrived at our destination! The generators are my absolute favorite Python language feature. A generator is a special kind of iterator—the elegant kind.

A generator allows you to write iterators much like the Fibonacci sequence iterator example above, but in an elegant succinct syntax that avoids writing classes with __iter__() and __next__() methods.

Let's be explicit:

  • Any generator also is an iterator (not vice versa!);
  • Any generator, therefore, is a factory that lazily produces values.

Here is the same Fibonacci sequence factory, but written as a generator:

>>> def fib():
... prev, curr = 0, 1
... while True:
... yield curr
... prev, curr = curr, prev + curr
...
>>> f = fib()
>>> list(islice(f, 0, 10))
[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

Wow, isn't that elegant? Notice the magic keyword that's responsible for the beauty:

yield

Let's break down what happened here: first of all, take note that fib is defined as a normal Python function, nothing special. Notice, however, that there's no return keyword inside the function body. The return value of the function will be a generator (read: an iterator, a factory, a stateful helper object).

Now when f = fib() is called, the generator (the factory) is instantiated and returned. No code will be executed at this point: the generator starts in an idle state initially. To be explicit: the line prev, curr = 0, 1 is not executed yet.

Then, this generator instance is wrapped in an islice(). This is itself also an iterator, so idle initially. Nothing happens, still.

Then, this iterator is wrapped in a list(), which will consume all of its arguments and build a list from it. To do so, it will start calling next() on the islice() instance, which in turn will start calling next() on our f instance.

But one step at a time. On the first invocation, the code will finally run a bit: prev, curr = 0, 1 gets executed, the while True loop is entered, and then it encounters the yield curr statement. It will produce the value that's currently in the curr variable and become idle again.

This value is passed to the islice() wrapper, which will produce it (because it's not past the 10th value yet), and list can add the value 1 to the list now.

Then, it asks islice() for the next value, which will ask f for the next value, which will "unpause" f from its previous state, resuming with the statement prev, curr = curr, prev + curr. Then it re-enters the next iteration of the while loop, and hits the yield curr statement, returning the next value of curr.

This happens until the output list is 10 elements long and when list() asks islice()for the 11th value, islice() will raise a StopIteration exception, indicating that the end has been reached, and list will return the result: a list of 10 items, containing the first 10 Fibonacci numbers. Notice that the generator doesn't receive the 11th next() call. In fact, it will not be used again, and will be garbage collected later.

Types of Generators

There are two types of generators in Python: generator functions and generator expressions. A generator function is any function in which the keyword yield appears in its body. We just saw an example of that. The appearance of the keyword yield is enough to make the function a generator function.

The other type of generators are the generator equivalent of a list comprehension. Its syntax is really elegant for a limited use case.

Suppose you use this syntax to build a list of sqaures:

>>> numbers = [1, 2, 3, 4, 5, 6]
>>> [x * x for x in numbers]
[1, 4, 9, 16, 25, 36]

You could do the same thing with a set comprehension:

>>> {x * x for x in numbers}
{1, 4, 36, 9, 16, 25}

Or a dict comprehension:

>>> {x: x * x for x in numbers}
{1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36}

But you can also use a generator expression (note: this is not a tuple comprehension):

>>> lazy_squares = (x * x for x in numbers)
>>> lazy_squares
<generator object <genexpr> at 0x10d1f5510>
>>> next(lazy_squares)
1
>>> list(lazy_squares)
[4, 9, 16, 25, 36]

Note that, because we read the first value from lazy_sqaures with next(), it's state is now at the "second" item, so when we consume it entirely by calling list(), that will only return the partial list of sqaures. (This is just to show the lazy behaviour.) This is as much a generator (and thus, an iterator) as the other examples above.

Summary

Generators are an incredible powerful programming construct. They allow you to write streaming code with fewer intermediate variables and data structures. Besides that, they are more memory and CPU efficient. Finally, they tend to require fewer lines of code, too.

Tip to get started with generators: find places in your code where you do the following:

def something():
result = []
for ... in ...:
result.append(x)
return result

And replace it by:

def iter_something():
for ... in ...:
yield x # def something(): # Only if you really need a list structure
# return list(iter_something())

Python 的关键字 yield 有哪些用法和用途?

 

Iterables vs. Iterators vs. Generators的更多相关文章

  1. Python高级特性(1):Iterators、Generators和itertools(转)

    译文:Python高级特性(1):Iterators.Generators和itertools [译注]:作为一门动态脚本语言,Python 对编程初学者而言很友好,丰富的第三方库能够给使用者带来很大 ...

  2. iterators和generators

    iterators >>> mylist=[x*x for x in range(3)] >>> mylist [0, 1, 4] generators >& ...

  3. ES6中的新特性:Iterables和iterators

    目录 简介 什么是iteration Iterable对象 普通对象不是可遍历的 自定义iterables 关闭iterators 总结 简介 为了方便集合数据的遍历,在ES6中引入了一个iterat ...

  4. Python标准模块--Iterators和Generators

    1 模块简介 当你开始使用Python编程时,你或许已经使用了iterators(迭代器)和generators(生成器),你当时可能并没有意识到.在本篇博文中,我们将会学习迭代器和生成器是什么.当然 ...

  5. Python高级特性(1):Iterators、Generators和itertools(参考)

    对数学家来说,Python这门语言有着很多吸引他们的地方.举几个例子:对于tuple.lists以及sets等容器的支持,使用与传统数学类 似的符号标记方式,还有列表推导式这样与数学中集合推导式和集的 ...

  6. 004_Python高级特性(1):Iterators、Generators和itertools(参考)

    对数学家来说,Python这门语言有着很多吸引他们的地方.举几个例子:对于tuple.lists以及sets等容器的支持,使用与传统数学类 似的符号标记方式,还有列表推导式这样与数学中集合推导式和集的 ...

  7. What's the Difference Between Iterators and Generators in Python

    https://www.quora.com/Whats-the-difference-between-iterators-and-generators-in-Python

  8. python模块学习:Iterators和Generators

    转自:http://www.cnblogs.com/zhbzz2007/p/6102695.html 1 迭代器: 迭代器,允许你在一个容器上进行迭代的对象. python的迭代器主要是通过__ite ...

  9. 理解Python迭代对象、迭代器、生成器

    作者:zhijun liu链接:https://zhuanlan.zhihu.com/p/24376869来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转载请注明出处. 本文源自RQ作 ...

随机推荐

  1. python3.x 和pip3的安装

    python3.x 和pip3的安装 本人在学习python3的时候,视频中使用的是python3,在讲解到有些第三方库的时候,无法使用到pip3 install来安装所需的库.由于系统是centos ...

  2. bzoj2333 离线 + 线段树

    https://www.lydsy.com/JudgeOnline/problem.php?id=2333 有N个节点,标号从1到N,这N个节点一开始相互不连通.第i个节点的初始权值为a[i],接下来 ...

  3. linux查看IP

    1:输入 ifconfig,出现如下信息,找到eno16777736(网卡ip信息的配置文件名) 2:输入 cd /etc/sysconfig/network-scripts 找到网卡ip信息的配置文 ...

  4. JMETER压力测试报错:JAVA.NET.BINDEXCEPTION: ADDRESS ALREADY IN USE: CONNECT

    最近在实现接口压力测试的时候遇到这样的一个问题 当线程数持续上升到一个点的时候,运行脚本的时候有很多报错,如图: java.net.BindException: Address already in ...

  5. MySQL中innodb_flush_log_at_trx_commit的设置

    innodb_flush_log_at_trx_commit=0,在提交事务时,InnoDB不会立即触发将缓存日志写到磁盘文件的操作,而是每秒触发一次缓存日志回写磁盘操作,并调用操作系统fsync刷新 ...

  6. HDU 1050(搬椅子 数学)

    题意是在一个有 400 个房间的走廊中搬动房间里的椅子,如果两次的路线重叠,就要分两次搬动,如果不重叠,就可以一次搬动. 开始的时候直接当成求线段重叠条数的题,发现这种思路完全是错的,比如 1 - 3 ...

  7. HDU 1021(斐波那契数与因子3 **)

    题意是说在给定的一种满足每一项等于前两项之和的数列中,判断第 n 项的数字是否为 3 的倍数. 斐波那契数在到第四十多位的时候就会超出 int 存储范围,但是题目问的是是否为 3 的倍数,也就是模 3 ...

  8. CSS脱离文档流&浮动

    什么是文档流? 将窗体从上至下分成一行一行,并在每行中按从左至右依次排放元素,称为文档流,也称为普通流.这个应该不难理解,HTML中全部元素都是盒模型,盒模型占用一定的空间,依次排放在HTML中,形成 ...

  9. 记一次解决非法参数DDoS攻击的实践

    起因 线上项目突然遭到大量的非法参数攻击,由于历史问题,之前的代码从未对请求参数进行校验. 导致大量请求落到了数据访问层,给应用服务器和数据库都带来了很大压力. 针对这个问题,只能对请求真正到Cont ...

  10. Mac 建PHP 环境 及 配置 apache 默认目录

    网上找的帮助,试一下,记录在此: 在Mac下配置php开发环境:Apache+php+MySql  https://www.imooc.com/article/15705?block_id=tuiji ...