“yield”关键字有什么作用?
Python中yield
关键字的用法是什么? 它有什么作用?
例如,我试图理解这个代码1 :
def _get_child_candidates(self, distance, min_dist, max_dist): if self._leftchild and distance - max_dist < self._median: yield self._leftchild if self._rightchild and distance + max_dist >= self._median: yield self._rightchild
这是来电者:
result, candidates = list(), [self] while candidates: node = candidates.pop() distance = node._get_dist(obj) if distance <= max_dist and distance >= min_dist: result.extend(node._values) candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) return result
调用方法_get_child_candidates
时会发生什么? 是列表返回? 一个单一的元素? 是否再次被调用? 随后的电话何时会停止?
1.代码来自Jochen Schulz(jrschulz),他为度量空间做了一个很棒的Python库。 这是完整源代码的链接: Module mspace 。
要了解yield
是什么,你必须了解发电机是什么。 并且在发电机来之前迭代 。
Iterables
当你创build一个列表,你可以逐个阅读它的项目。 逐个读取它的项目称为迭代:
>>> mylist = [1, 2, 3] >>> for i in mylist: ... print(i) 1 2 3
mylist
是一个可迭代的 。 当你使用列表理解,你创build一个列表,所以一个可迭代的:
>>> mylist = [x*x for x in range(3)] >>> for i in mylist: ... print(i) 0 1 4
你可以使用的所有东西都是可迭代的; lists
, strings
,文件…
这些迭代器很方便,因为您可以随心所欲地读取它们,但是将所有值存储在内存中,而且当有很多值时,并不总是您想要的值。
发电机
生成器是迭代器,一种迭代器, 只能迭代一次 。 发生器不会将所有的值存储在内存中, 它们会立即生成值 :
>>> mygenerator = (x*x for x in range(3)) >>> for i in mygenerator: ... print(i) 0 1 4
除了你使用()
而不是[]
之外,它是一样的。 但是,由于发电机只能使用一次,所以你不能 for i in mygenerator
第二次在发电机组中执行:他们计算0,然后忘记计算1,并逐个计算4。
产量
yield
是一个像return
一样使用的关键字,除了该函数将返回一个生成器。
>>> def createGenerator(): ... mylist = range(3) ... for i in mylist: ... yield i*i ... >>> mygenerator = createGenerator() # create a generator >>> print(mygenerator) # mygenerator is an object! <generator object createGenerator at 0xb7555c34> >>> for i in mygenerator: ... print(i) 0 1 4
这是一个无用的例子,但是当你知道你的函数将会返回一大堆你只需要读取一次的值的时候,它是很方便的。
要掌握yield
,您必须了解当您调用函数时,您在函数体中编写的代码不会运行。 该函数只返回生成器对象,这有点棘手:-)
然后,您的代码将在每次使用生成器时运行。
现在困难的部分:
第一次调用由你的函数创build的generator对象时,它会从你的函数开始直到碰到yield
,然后返回循环的第一个值。 然后,每个其他的调用将运行你已经写在函数中的循环一次,并返回下一个值,直到没有值返回。
一旦函数运行,发生器就被认为是空的,但是不会再发生。 这可能是因为循环已经结束,或者因为你不再满足"if/else"
了。
你的代码解释
发电机:
# Here you create the method of the node object that will return the generator def _get_child_candidates(self, distance, min_dist, max_dist): # Here is the code that will be called each time you use the generator object: # If there is still a child of the node object on its left # AND if distance is ok, return the next child if self._leftchild and distance - max_dist < self._median: yield self._leftchild # If there is still a child of the node object on its right # AND if distance is ok, return the next child if self._rightchild and distance + max_dist >= self._median: yield self._rightchild # If the function arrives here, the generator will be considered empty # there is no more than two values: the left and the right children
呼叫者:
# Create an empty list and a list with the current object reference result, candidates = list(), [self] # Loop on candidates (they contain only one element at the beginning) while candidates: # Get the last candidate and remove it from the list node = candidates.pop() # Get the distance between obj and the candidate distance = node._get_dist(obj) # If distance is ok, then you can fill the result if distance <= max_dist and distance >= min_dist: result.extend(node._values) # Add the children of the candidate in the candidates list # so the loop will keep running until it will have looked # at all the children of the children of the children, etc. of the candidate candidates.extend(node._get_child_candidates(distance, min_dist, max_dist)) return result
该代码包含几个智能部分:
-
循环在一个列表上进行迭代,但是在迭代循环时,列表将被扩展:-)这是一个简洁的方式来查看所有这些嵌套的数据,即使这样做有点危险,因为你可能会以无限循环结束。 在这种情况下,
candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
耗尽生成器的所有值,但是while
一直创build新的生成器对象,这会产生与以前不同的值,因为它不会应用于相同的节点。 -
extend()
方法是一个列表对象方法,它需要一个迭代并将其值添加到列表中。
通常我们通过一个列表:
>>> a = [1, 2] >>> b = [3, 4] >>> a.extend(b) >>> print(a) [1, 2, 3, 4]
但是在你的代码中它会得到一个生成器,这是很好的,因为:
- 您不需要两次读取值。
- 你可能有很多的孩子,你不希望他们都存储在内存中。
它的工作原理是因为Python不关心一个方法的参数是否是一个列表。 Python期望iterables,所以它将与string,列表,元组和生成器一起工作! 这就是所谓的鸭子打字,也是Python如此酷的原因之一。 但是,这是另一个故事,另一个问题…
你可以在这里停下来,或者稍微阅读一下看看生成器的高级用法:
控制发电机耗尽
>>> class Bank(): # let's create a bank, building ATMs ... crisis = False ... def create_atm(self): ... while not self.crisis: ... yield "$100" >>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want >>> corner_street_atm = hsbc.create_atm() >>> print(corner_street_atm.next()) $100 >>> print(corner_street_atm.next()) $100 >>> print([corner_street_atm.next() for cash in range(5)]) ['$100', '$100', '$100', '$100', '$100'] >>> hsbc.crisis = True # crisis is coming, no more money! >>> print(corner_street_atm.next()) <type 'exceptions.StopIteration'> >>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs >>> print(wall_street_atm.next()) <type 'exceptions.StopIteration'> >>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty >>> print(corner_street_atm.next()) <type 'exceptions.StopIteration'> >>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business >>> for cash in brand_new_atm: ... print cash $100 $100 $100 $100 $100 $100 $100 $100 $100 ...
它可以用于控制对资源的访问等各种function。
Itertools,你最好的朋友
itertools模块包含特殊的函数来操作iterables。 是否希望复制一个生成器? 链两个发电机? 用一个class轮在一个嵌套列表中分组值? Map / Zip
没有创build另一个列表?
然后只需import itertools
。
一个例子? 让我们看看四匹马的可能的到达顺序:
>>> horses = [1, 2, 3, 4] >>> races = itertools.permutations(horses) >>> print(races) <itertools.permutations object at 0xb754f1dc> >>> print(list(itertools.permutations(horses))) [(1, 2, 3, 4), (1, 2, 4, 3), (1, 3, 2, 4), (1, 3, 4, 2), (1, 4, 2, 3), (1, 4, 3, 2), (2, 1, 3, 4), (2, 1, 4, 3), (2, 3, 1, 4), (2, 3, 4, 1), (2, 4, 1, 3), (2, 4, 3, 1), (3, 1, 2, 4), (3, 1, 4, 2), (3, 2, 1, 4), (3, 2, 4, 1), (3, 4, 1, 2), (3, 4, 2, 1), (4, 1, 2, 3), (4, 1, 3, 2), (4, 2, 1, 3), (4, 2, 3, 1), (4, 3, 1, 2), (4, 3, 2, 1)]
了解迭代的内在机制
迭代是一个暗示iterables(实现__iter__()
方法)和迭代器(实现__next__()
方法)的过程。 Iterables是可以从中获取迭代器的任何对象。 迭代器是可以迭代迭代的对象。
在这篇文章中关于for循环是如何工作的更多内容。
快捷yield
当你看到带有yield
语句的函数时,应用这个简单的技巧来了解将会发生什么:
- 在函数的开始插入一行
result = []
。 - 用
result.append(expr)
replace每个yield expr
。 - 在函数的底部插入一个行
return result
。 - 是的 – 没有更多的
yield
声明! 阅读并找出代码。 - 比较function与原始定义。
这个技巧可能会给你一个关于函数背后的逻辑的概念,但是yield
实际发生与基于列表的方法发生的事情有很大的不同。 在许多情况下,收益率方法将更加高效,更快速。 在其他情况下,这个技巧会让你陷入一个无限循环,尽pipe原始函数工作的很好。 请继续阅读以了解更多信息…
不要混淆你的Iterables,迭代器和发生器
首先, 迭代器协议 – 当你写
for x in mylist: ...loop body...
Python执行以下两个步骤:
-
获取
mylist
的迭代器:调用
iter(mylist)
– >这将返回一个带有next()
方法的对象(或Python 3中的__next__()
)。[这是大多数人忘记告诉你的步骤]
-
使用迭代器遍历项目:
继续从步骤1返回的迭代器上调用
next()
方法。将next()
的返回值赋给x
,循环体被执行。 如果从next()
引发exceptionStopIteration
,则意味着迭代器中没有更多的值,并且退出循环。
事实上,Python只要循环对象的内容就可以执行上述两个步骤 – 所以它可能是一个for循环,但也可以是像otherlist.extend(mylist)
(其中otherlist
是一个Python列表) 。
这里mylist
是一个迭代器,因为它实现了迭代器协议。 在用户定义的类中,可以实现__iter__()
方法来使您的类的实例可迭代。 这个方法应该返回一个迭代器 。 迭代器是带有next()
方法的对象。 可以在同一个类上同时实现__iter__()
和next()
,并且__iter__()
返回self
。 这将适用于简单的情况,但是当你想让两个迭代器同时在同一个对象上循环的时候不行。
这就是迭代器协议,许多对象实现这个协议:
- 内置列表,字典,元组,集合,文件。
- 实现
__iter__()
用户定义的类。 - 发电机。
请注意, for
循环并不知道它处理的是什么types的对象 – 它只是遵循迭代器协议,并且很乐意在next()
函数后获取item。 内build列表逐个返回它们的项目,字典逐个返回键 ,文件逐一返回等等。而生成器返回…那么这就是yield
的地方:
def f123(): yield 1 yield 2 yield 3 for item in f123(): print item
而不是yield
语句,如果在f123()
有三个return
语句, f123()
只有第一个会被执行,并且该函数将退出。 但f123()
不是普通的函数。 当f123()
,它不返回yield语句中的任何值! 它返回一个生成器对象。 而且,函数并不真正退出 – 它进入暂停状态。 当for
循环尝试循环生成器对象时,函数从之前返回的yield
下一行恢复到暂停状态,执行下一行代码(本例中为yield
语句),并将其返回为下一个项目。 发生这种情况直到函数退出,此时生成器引发StopIteration
,循环退出。
所以生成器对象有点像一个适配器 – 一方面它展示了迭代器协议,通过暴露__iter__()
和next()
方法来保持for
循环的快乐。 然而,在另一端,它运行的function恰好足以让下一个值出来,并把它放回暂停模式。
为什么使用生成器?
通常你可以编写不使用生成器但是实现相同逻辑的代码。 一个select是使用我之前提到的临时列表“技巧”。 这在所有情况下都不起作用,例如,如果你有无限循环,或者当你有一个很长的列表时,它可能会无效地使用内存。 另一种方法是实现一个新的可迭代的类SomethingIter
,它保存实例成员的状态,并在Python 3的next()
(或__next__()
)方法中执行下一个逻辑步骤。 取决于逻辑, next()
方法中的代码可能最终看起来非常复杂,容易出现错误。 这里的发电机提供了一个干净而简单的解
这样想:
迭代器只是一个具有next()方法的对象的一个奇妙的声音术语。 所以yield-ed函数最终会是这样的:
原始版本:
def some_function(): for i in xrange(4): yield i for i in some_function(): print i
这基本上是python解释器用上面的代码所做的:
class it: def __init__(self): #start at -1 so that we get 0 when we add 1 below. self.count = -1 #the __iter__ method will be called once by the for loop. #the rest of the magic happens on the object returned by this method. #in this case it is the object itself. def __iter__(self): return self #the next method will be called repeatedly by the for loop #until it raises StopIteration. def next(self): self.count += 1 if self.count < 4: return self.count else: #a StopIteration exception is raised #to signal that the iterator is done. #This is caught implicitly by the for loop. raise StopIteration def some_func(): return it() for i in some_func(): print i
为了更深入地了解幕后发生的事情,可以将for循环重写为:
iterator = some_func() try: while 1: print iterator.next() except StopIteration: pass
这是更有意义还是更混淆你? 🙂
编辑:我应该指出,这是一个简单的说明目的。 🙂
编辑2:忘记抛出StopIterationexception
yield
关键字被简化为两个简单的事实:
- 如果编译器在函数内的任何位置检测到
yield
关键字,则该函数不再通过return
语句return
。 相反 ,它立即返回一个懒惰的“待处理列表”对象,称为生成器 - 一个生成器是可迭代的。 什么是可迭代的 ? 它就像是一个
list
或set
或range
或字典视图,有一个内置的协议,以一定的顺序访问每个元素 。
简而言之, 生成器是一个懒惰的递增列表 , yield
语句允许您使用函数表示法来编写生成器应该逐渐吐出的列表值 。
generator = myYieldingFunction(...) x = list(generator) generator v [x[0], ..., ???] generator v [x[0], x[1], ..., ???] generator v [x[0], x[1], x[2], ..., ???] StopIteration exception [x[0], x[1], x[2]] done list==[x[0], x[1], x[2]]
例
让我们定义一个函数makeRange
,就像Python的range
。 调用makeRange(n)
一个发生器:
def makeRange(n): # return 0,1,2,...,n-1 i = 0 while i < n: yield i i += 1 >>> makeRange(5) <generator object makeRange at 0x19e4aa0>
为了强制生成器立即返回其挂起的值,你可以将它传递给list()
(就像你可以迭代):
>>> list(makeRange(5)) [0, 1, 2, 3, 4]
将示例与“刚刚返回列表”进行比较
上面的例子可以被认为只是创build一个你追加并返回的列表:
# list-version # # generator-version def makeRange(n): # def makeRange(n): """return [0,1,2,...,n-1]""" #~ """return 0,1,2,...,n-1""" TO_RETURN = [] #> i = 0 # i = 0 while i < n: # while i < n: TO_RETURN += [i] #~ yield i i += 1 # i += 1 ## indented return TO_RETURN #> >>> makeRange(5) [0, 1, 2, 3, 4]
但是有一个主要区别, 看最后一节。
你如何使用生成器
迭代是列表理解的最后一部分,所有的生成器都是可迭代的,所以它们经常被这样使用:
# _ITERABLE_ >>> [x+10 for x in makeRange(5)] [10, 11, 12, 13, 14]
为了更好地感受生成器,可以使用itertools
模块(确保使用chain.from_iterable
而不是chain
)。 例如,你甚至可以使用生成器来实现像itertools.count()
这样的无限长的懒惰列表。 你可以实现你自己的def enumerate(iterable): zip(count(), iterable)
,或者在while循环中用yield
关键字来实现。
请注意:生成器实际上可以用于更多的事情,比如实现协程或非确定性编程或其他优雅的东西。 但是,我在这里提出的“懒惰列表”的观点是最常见的用法,你会发现。
在幕后
这就是“Python迭代协议”的工作原理。 也就是说,当你list(makeRange(5))
。 这是我之前描述的“懒惰,增量列表”。
>>> x=iter(range(5)) >>> next(x) 0 >>> next(x) 1 >>> next(x) 2 >>> next(x) 3 >>> next(x) 4 >>> next(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
内置函数next()
只是调用对象.next()
函数,它是“迭代协议”的一部分,可在所有迭代器中find。 您可以手动使用next()
函数(和迭代协议的其他部分)来实现花哨的东西,通常是以可读性为代价的,所以尽量避免这样做…
细节
通常情况下,大多数人不会在乎以下的区别,并可能想在这里停止阅读。
在Python中,一个可迭代的是任何“理解for循环的概念”的对象,如列表[1,2,3]
, 迭代器是被请求的for循环的特定实例,如[1,2,3].__iter__()
。 生成器与任何迭代器完全相同,除了写入的方式(使用函数语法)。
当你从一个列表中请求一个迭代器时,它会创build一个新的迭代器。 但是,当你从一个迭代器(你很less会这么做)请求一个迭代器时,它只是给你一个自己的副本。
因此,万一你不能做这样的事情…
> x = myRange(5) > list(x) [0, 1, 2, 3, 4] > list(x) []
然后记住一个生成器是一个迭代器 ; 那就是一次性使用。 如果你想重用它,你应该再次调用myRange(...)
。 如果您需要使用两次结果,请将结果转换为列表并将其存储在variablesx = list(myRange(5))
。 那些绝对需要克隆一个生成器的人(例如,可怕的元编程人员)可以在绝对必要的情况下使用itertools.tee
,因为可复制的迭代器Python PEP标准提议已被推迟。
yield
就像return
一样 – 无论你告诉它什么,它都会返回。 唯一的区别是下一次调用函数时,执行从最后一次调用yield
语句开始。
在你的代码中,函数get_child_candidates
像迭代器一样工作,所以当你扩展你的列表时,它会一次添加一个元素到新的列表中。
list.extend
调用一个迭代器直到耗尽。 在你发布的代码示例的情况下,只要返回一个元组并将其附加到列表就会更清晰。
yield
关键字在Python中做什么?
答案大纲/总结
- 带有
yield
的函数在调用时会返回一个Generator 。 - 生成器是迭代器,因为它们实现了迭代器协议 ,所以你可以迭代它们。
- 发电机也可以发送信息 ,从概念上说它是一个协同程序 。
- 在Python 3中,可以使用
yield from
两个方向将一个生成器委托给另一个生成器。 - (附录批判了几个答案,包括最上面的答案,并讨论了在生成器中使用
return
。)
发电机:
yield
只是在函数定义内部是合法的,并且在函数定义中包含yield
使得它返回一个生成器。
发电机的想法来自其他语言(见脚注1),具有不同的实现。 在Python的Generators中,代码的执行被冻结在yield的点上。 当发生器被调用(方法在下面讨论),执行恢复,然后冻结在下一个产量。
yield
提供了一种实现迭代器协议的简单方法,由以下两种方法定义: __iter__
和next
(Python 2)或__next__
(Python 3)。 这两种方法都使对象成为一个迭代器,您可以使用collections
模块中的Iterator
Abstract Base Class进行types检查。
>>> def func(): ... yield 'I am' ... yield 'a generator!' ... >>> type(func) # A function with yield is still a function <type 'function'> >>> gen = func() >>> type(gen) # but it returns a generator <type 'generator'> >>> hasattr(gen, '__iter__') # that's an iterable True >>> hasattr(gen, 'next') # and with .next (.__next__ in Python 3) True # implements the iterator protocol.
生成器types是迭代器的子types:
>>> import collections, types >>> issubclass(types.GeneratorType, collections.Iterator) True
如有必要,我们可以像这样进行types检查:
>>> isinstance(gen, types.GeneratorType) True >>> isinstance(gen, collections.Iterator) True
Iterator
一个特性是一旦耗尽 ,就不能重用或重置:
>>> list(gen) ['I am', 'a generator!'] >>> list(gen) []
如果你想再次使用它的function,你将不得不做另一个(见脚注2):
>>> list(func()) ['I am', 'a generator!']
可以通过编程产生数据,例如:
def func(an_iterable): for item in an_iterable: yield item
上面的简单生成器也相当于Python 3.3以下版本(在Python 2中不可用),您可以使用yield from
:
def func(an_iterable): yield from an_iterable
但是,由于yield from
也允许向子生成器进行授权,下一节将对子协程的合作授权进行说明。
协同程序:
yield
形成一个expression式,允许数据发送到发生器(见脚注3)
这里是一个例子,记下received
variables,它将指向发送给生成器的数据:
def bank_account(deposited, interest_rate): while True: calculated_interest = interest_rate * deposited received = yield calculated_interest if received: deposited += received >>> my_account = bank_account(1000, .05)
首先,我们必须使用内build函数排队生成器, next
。 它将调用相应的next
或__next__
方法,具体取决于您使用的Python版本:
>>> first_year_interest = next(my_account) >>> first_year_interest 50.0
现在我们可以将数据发送到发生器。 ( 发送None
与下一个调用相同 ):
>>> next_year_interest = my_account.send(first_year_interest + 1000) >>> next_year_interest 102.5
合作委托给子协程, yield from
现在回想一下, yield from
Python 3中可以yield from
。这允许我们将协程委托给一个子协程:
def money_manager(expected_rate): under_management = yield # must receive deposited value while True: try: additional_investment = yield expected_rate * under_management if additional_investment: under_management += additional_investment except GeneratorExit: '''TODO: write function to send unclaimed funds to state''' finally: '''TODO: write function to mail tax info to client''' def investment_account(deposited, manager): '''very simple model of an investment account that delegates to a manager''' next(manager) # must queue up manager manager.send(deposited) while True: try: yield from manager except GeneratorExit: return manager.close()
现在我们可以将function委托给一个子生成器,并且可以像上面一样使用生成器:
>>> my_manager = money_manager(.06) >>> my_account = investment_account(1000, my_manager) >>> first_year_return = next(my_account) >>> first_year_return 60.0 >>> next_year_return = my_account.send(first_year_return + 1000) >>> next_year_return 123.6
您可以yield from
PEP 380中阅读更多关于yield from
的精确语义。
其他方法:closures并扔掉
close
方法在函数执行被冻结的地方引发GeneratorExit
。 这也将被__del__
调用,所以你可以把任何清理代码放在你处理GeneratorExit
:
>>> my_account.close()
你也可以抛出一个可以在生成器中处理的exception或传播给用户:
>>> import sys >>> try: ... raise ValueError ... except: ... my_manager.throw(*sys.exc_info()) ... Traceback (most recent call last): File "<stdin>", line 4, in <module> File "<stdin>", line 2, in <module> ValueError
结论
我相信我已经涵盖了以下问题的所有方面:
yield
关键字在Python中做什么?
事实certificate, yield
做了很多。 我相信我可以添加更多的例子。 如果你想要更多或有一些build设性的批评,请通过下面的评论让我知道。
附录:
最高/可接受答案的评论**
- 它是什么使一个迭代困惑,只是使用一个列表作为例子。 看到我上面的引用,但总结:一个迭代器有一个
__iter__
方法返回一个迭代器 。 迭代器提供了一个.next
(Python 2或.__next__
(Python 3)方法,该方法隐含地通过for
循环调用,直到引发StopIteration
,一旦它结束,它将继续执行。 - 然后使用生成器expression式来描述生成器是什么。 由于生成器只是创build迭代器的简便方法,所以只是混淆了事情,还没有到达
yield
部分。 - 在控制生成器耗尽时,他调用
.next
方法,而是next
应该使用内置函数。 这将是一个适当的间接层,因为他的代码在Python 3中不起作用。 - Itertools? 这与
yield
完全不相关。 - 没有讨论
yield
的方法以及在Python 3中yield from
的新function。 顶部/接受的答案是一个非常不完整的答案。
对发电机expression式或理解中的yield
进行回答的批评。
语法目前允许在列表理解中的任何expression。
expr_stmt: testlist_star_expr (annassign | augassign (yield_expr|testlist) | ('=' (yield_expr|testlist_star_expr))*) ... yield_expr: 'yield' [yield_arg] yield_arg: 'from' test | testlist
由于yield是一个expression式,所以它被一些人认为在理解或者expression式中使用它是有趣的,尽pipe它并没有提到特别好的用例。
CPython核心开发人员正在讨论将其折让 。 以下是邮件列表中的相关post:
2017年1月30日19:05,Brett Cannon写道:
在星期日,2017年1月29日16:39 Craig Rodrigues写道:
我可以用任何一种方法。 恕我直言,他们在Python 3的方式是不好的,恕我直言。
我的投票是一个SyntaxError,因为你没有得到你所期望的语法。
我同意这是一个明智的地方,因为任何依赖当前行为的代码实在太聪明,无法维护。
就达到目标而言,我们可能希望:
- 在3.7中使用SyntaxWarning或DeprecationWarning
- 2.7.x中的Py3k警告
- 3.8中的SyntaxError
干杯,尼克。
– Nick Coghlan | ncoghlan在gmail.com | 澳大利亚布里斯class
此外,还有一个突出的问题(10544) ,似乎指向这个方向从来不是一个好主意(PyPy,用Python编写的Python实现,已经提高了语法警告)。
Bottom line, until the developers of CPython tell us otherwise: Don't put yield
in a generator expression or comprehension.
The return
statement in a generator
In Python 2 :
In a generator function, the
return
statement is not allowed to include anexpression_list
. In that context, a barereturn
indicates that the generator is done and will causeStopIteration
to be raised.
An expression_list
is basically any number of expressions separated by commas – essentially, in Python 2, you can stop the generator with return
, but you can't return a value.
In Python 3 :
In a generator function, the
return
statement indicates that the generator is done and will causeStopIteration
to be raised. The returned value (if any) is used as an argument to constructStopIteration
and becomes theStopIteration.value
attribute.
脚注
-
The languages CLU, Sather, and Icon were referenced in the proposal to introduce the concept of generators to Python. The general idea is that a function can maintain internal state and yield intermediate data points on demand by the user. This promised to be superior in performance to other approaches, including Python threading , which isn't even available on some systems.
-
This means, for example, that
xrange
objects (range
in Python 3) aren'tIterator
s, even though they are iterable, because they can be reused. Like lists, their__iter__
methods return iterator objects. -
yield
was originally introduced as a statement, meaning that it could only appear at the beginning of a line in a code block. Nowyield
creates a yield expression. https://docs.python.org/2/reference/simple_stmts.html#grammar-token-yield_stmt This change was proposed to allow a user to send data into the generator just as one might receive it. To send data, one must be able to assign it to something, and for that, a statement just won't work.
There's one extra thing to mention: a function that yields doesn't actually have to terminate. I've written code like this:
def fib(): last, cur = 0, 1 while True: yield cur last, cur = cur, last + cur
Then I can use it in other code like this:
for f in fib(): if some_condition: break coolfuncs(f);
It really helps simplify some problems, and makes some things easier to work with.
For those who prefer a minimal working example, meditate on this interactive Python session:
>>> def f(): ... yield 1 ... yield 2 ... yield 3 ... >>> g = f() >>> for i in g: ... print i ... 1 2 3 >>> for i in g: ... print i ... >>> # Note that this time nothing was printed
Yield gives you a generator.
def get_odd_numbers(i): return range(1, i, 2) def yield_odd_numbers(i): for x in range(1, i, 2): yield x foo = get_odd_numbers(10) bar = yield_odd_numbers(10) foo [1, 3, 5, 7, 9] bar <generator object yield_odd_numbers at 0x1029c6f50> bar.next() 1 bar.next() 3 bar.next() 5
As you can see, in the first case foo holds the entire list in memory at once. It's not a big deal for a list with 5 elements, but what if you want a list of 5 million? Not only is this a huge memory eater, it also costs a lot of time to build at the time that the function is called. In the second case, bar just gives you a generator. A generator is an iterable–which means you can use it in a for loop, etc, but each value can only be accessed once. All the values are also not stored in memory at the same time; the generator object "remembers" where it was in the looping the last time you called it–this way, if you're using an iterable to (say) count to 50 billion, you don't have to count to 50 billion all at once and store the 50 billion numbers to count through. Again, this is a pretty contrived example, you probably would use itertools if you really wanted to count to 50 billion. 🙂
This is the most simple use case of generators. As you said, it can be used to write efficient permutations, using yield to push things up through the call stack instead of using some sort of stack variable. Generators can also be used for specialized tree traversal, and all manner of other things.
There is one type of answer that I don't feel has been given yet, among the many great answers that describe how to use generators. Here is the PL theory answer:
The yield
statement in python returns a generator. A generator in python is a function that returns continuations (and specifically a type of coroutine, but continuations represent the more general mechanism to understand what is going on).
Continuations in programming languages theory are a much more fundamental kind of computation, but they are not often used because they are extremely hard to reason about and also very difficult to implement. But the idea of what a continuation is, is straightforward: it is the state of a computation that has not yet finished. In this state are saved the current values of variables and the operations that have yet to be performed, and so on. Then at some point later in the program the continuation can be invoked, such that the program's variables are reset to that state and the operations that were saved are carried out.
Continuations, in this more general form, can be implemented in two ways. In the call/cc
way, the program's stack is literally saved and then when the continuation is invoked, the stack is restored.
In continuation passing style (CPS), continuations are just normal functions (only in languages where functions are first class) which the programmer explicitly manages and passes around to subroutines. In this style, program state is represented by closures (and the variables that happen to be encoded in them) rather than variables that reside somewhere on the stack. Functions that manage control flow accept continuation as arguments (in some variations of CPS, functions may accept multiple continuations) and manipulate control flow by invoking them by simply calling them and returning afterwards. A very simple example of continuation passing style is as follows:
def save_file(filename): def write_file_continuation(): write_stuff_to_file(filename) check_if_file_exists_and_user_wants_to_overwrite( write_file_continuation )
In this (very simplistic) example, the programmer saves the operation of actually writing the file into a continuation (which can potentially be a very complex operation with many details to write out), and then passes that continuation (ie, as a first-class closure) to another operator which does some more processing, and then calls it if necessary. (I use this design pattern a lot in actual GUI programming, either because it saves me lines of code or, more importantly, to manage control flow after GUI events trigger)
The rest of this post will, without loss of generality, conceptualize continuations as CPS, because it is a hell of a lot easier to understand and read.
Now let's talk about generators in python. Generators are a specific subtype of continuation. Whereas continuations are able in general to save the state of a computation (ie, the program's call stack), generators are only able to save the state of iteration over an iterator . Although, this definition is slightly misleading for certain use cases of generators. 例如:
def f(): while True: yield 4
This is clearly a reasonable iterable whose behavior is well defined — each time the generator iterates over it, it returns 4 (and does so forever). But it isn't probably the prototypical type of iterable that comes to mind when thinking of iterators (ie, for x in collection: do_something(x)
). This example illustrates the power of generators: if anything is an iterator, a generator can save the state of its iteration.
To reiterate: Continuations can save the state of a program's stack and generators can save the state of iteration. This means that continuations are more a lot powerful than generators, but also that generators are a lot, lot easier. They are easier for the language designer to implement, and they are easier for the programmer to use (if you have some time to burn, try to read and understand this page about continuations and call/cc ).
But you could easily implement (and conceptualize) generators as a simple, specific case of continuation passing style:
Whenever yield
is called, it tells the function to return a continuation. When the function is called again, it starts from wherever it left off. So, in pseudo-pseudocode (ie, not pseudocode but not code) the generator's next
method is basically as follows:
class Generator(): def __init__(self,iterable,generatorfun): self.next_continuation = lambda:generatorfun(iterable) def next(self): value, next_continuation = self.next_continuation() self.next_continuation = next_continuation return value
where yield
keyword is actually syntactic sugar for the real generator function, basically something like:
def generatorfun(iterable): if len(iterable) == 0: raise StopIteration else: return (iterable[0], lambda:generatorfun(iterable[1:]))
Remember that this is just pseudocode and the actual implementation of generators in python is more complex. But as an exercise to understand what is going on, try to use continuation passing style to implement generator objects without use of the yield
keyword.
It's returning a generator. I'm not particularly familiar with Python, but I believe it's the same kind of thing as C#'s iterator blocks if you're familiar with those.
There's an IBM article which explains it reasonably well (for Python) as far as I can see.
The key idea is that the compiler/interpreter/whatever does some trickery so that as far as the caller is concerned, they can keep calling next() and it will keep returning values – as if the generator method was paused . Now obviously you can't really "pause" a method, so the compiler builds a state machine for you to remember where you currently are and what the local variables etc look like. This is much easier than writing an iterator yourself.
TL; DR
When you find yourself building a list
from scratch…
def squares_list(n): the_list = [] # Replace for x in range(n): y = x * x the_list.append(y) # these return the_list # lines
… yield
the pieces instead.
def squares_the_yield_way(n): for x in range(n): y = x * x yield y # with this.
This was my first aha-moment with yield.
yield
is a sugary way to say
build a series of stuff
Same behavior:
>>> for square in squares_list(4): ... print(square) ... 0 1 4 9 >>> for square in squares_the_yield_way(4): ... print(square) ... 0 1 4 9
Different behavior:
Yield is single-use : you can only iterate through once. Conceptually the yield-function returns an ordered container of things. But it's revealing that we call any function with a yield in it a generator function . And the term for what it returns is an iterator .
Yield is lazy , it puts off computation until you need it. A function with a yield in it doesn't actually execute at all when you call it. The iterator object it returns uses magic to maintain the function's internal context. Each time you call next()
on the iterator (as happens in a for-loop), execution inches forward to the next yield. (Or return
, which raises StopIteration
and ends the series.)
Yield is versatile . It can do infinite loops:
>>> def squares_all_of_them(): ... x = 0 ... while True: ... yield x * x ... x += 1 ... >>> squares = squares_all_of_them() >>> for i in range(6): ... print(next(squares)) ... 0 1 4 9 16 25
If you need multi-use and the series isn't humongous, just pass the iterator to list()
>>> list(squares_the_yield_way(4)) [0, 1, 4, 9]
Brilliant choice of the word yield
because both meanings of the verb apply:
yield — produce or provide (as in agriculture)
…provide the next data in the series.
yield — give way or relinquish (as in political power)
…relinquish CPU execution until the iterator advances.
An example in plain language. I will provide a correspondence between high-level human concepts to low-level python concepts.
I want to operate on a sequence of numbers, but I don't want to bother my self with the creation of that sequence, I want only to focus on the operation I want to do. So, I do the following:
- I call you and tell you that I want a sequence of numbers which is produced in a specific way, and I let you know what the algorithm is.
This step corresponds todef
ining the generator function, ie the function containing ayield
. - Sometime later, I tell you, "ok, get ready to tell me the sequence of numbers".
This step corresponds to calling the generator function which returns a generator object. Note that you don't tell me any numbers yet, you just grab your paper and pencil. - I ask you, "tell me the next number", and you tell me the first number; after that, you wait for me to ask you for the next number. It's your job to remember where you were, what numbers you have already said, what is the next number. I don't care about the details.
This step corresponds to calling.next()
on the generator object. - … repeat previous step, until…
- eventually, you might come to an end. You don't tell me a number, you just shout, "hold your horses! I'm done! No more numbers!"
This step corresponds to the generator object ending its job, and raising aStopIteration
exception The generator function does not need to raise the exception, it's raised automatically when the function ends or issues areturn
.
This is what a generator does (a function that contains a yield
); it starts executing, pauses whenever it does a yield
, and when asked for a .next()
value it continues from the point it was last. It fits perfectly by design with the iterator protocol of python, which describes how to sequentially request for values.
The most famous user of the iterator protocol is the for
command in python. So, whenever you do a:
for item in sequence:
it doesn't matter if sequence
is a list, a string, a dictionary or a generator object like described above; the result is the same: you read items off a sequence one by one.
Note that def
ining a function which contains a yield
keyword is not the only way to create a generator; it's just the easiest way to create one.
For more accurate information, read about iterator types , the yield statement and generators in the Python documentation.
While a lot of answers show why you'd use a yield
to create a generator, there are more uses for yield
. It's quite easy to make a coroutine, which enables the passing of information between two blocks of code. I won't repeat any of the fine examples that have already been given about using yield
to create a generator.
To help understand what a yield
does in the following code, you can use your finger to trace the cycle through any code that has a yield
. Every time your finger hits the yield
, you have to wait for a next
or a send
to be entered. When a next
is called, you trace through the code until you hit the yield
… the code on the right of the yield
is evaluated and returned to the caller… then you wait. When next
is called again, you perform another loop through the code. However, you'll note that in a coroutine, yield
can also be used with a send
… which will send a value from the caller into the yielding function. If a send
is given, then yield
receives the value sent, and spits it out the left hand side… then the trace through the code progresses until you hit the yield
again (returning the value at the end, as if next
was called).
例如:
>>> def coroutine(): ... i = -1 ... while True: ... i += 1 ... val = (yield i) ... print("Received %s" % val) ... >>> sequence = coroutine() >>> sequence.next() 0 >>> sequence.next() Received None 1 >>> sequence.send('hello') Received hello 2 >>> sequence.close()
There is another yield
use and meaning (since python 3.3):
yield from <expr>
http://www.python.org/dev/peps/pep-0380/
A syntax is proposed for a generator to delegate part of its operations to another generator. This allows a section of code containing 'yield' to be factored out and placed in another generator. Additionally, the subgenerator is allowed to return with a value, and the value is made available to the delegating generator.
The new syntax also opens up some opportunities for optimisation when one generator re-yields values produced by another.
moreover this will introduce (since python 3.5):
async def new_coroutine(data): ... await blocking_action()
to avoid coroutines confused with regular generator (today yield
is used in both).
I was going to post "read page 19 of Beazley's 'Python: Essential Reference' for a quick description of generators", but so many others have posted good descriptions already.
Also, note that yield
can be used in coroutines as the dual of their use in generator functions. Although it isn't the same use as your code snippet, (yield)
can be used as an expression in a function. When a caller sends a value to the method using the send()
method, then the coroutine will execute until the next (yield)
statement is encountered.
Generators and coroutines are a cool way to set up data-flow type applications. I thought it would be worthwhile knowing about the other use of the yield
statement in functions.
Here are some Python examples of how to actually implement generators as if Python did not provide syntactic sugar for them:
As a Python generator:
from itertools import islice def fib_gen(): a, b = 1, 1 while True: yield a a, b = b, a + b assert [1, 1, 2, 3, 5] == list(islice(fib_gen(), 5))
Using lexical closures instead of generators
def ftake(fnext, last): return [fnext() for _ in xrange(last)] def fib_gen2(): #funky scope due to python2.x workaround #for python 3.x use nonlocal def _(): _.a, _.b = _.b, _.a + _.b return _.a _.a, _.b = 0, 1 return _ assert [1,1,2,3,5] == ftake(fib_gen2(), 5)
Using object closures instead of generators (because ClosuresAndObjectsAreEquivalent )
class fib_gen3: def __init__(self): self.a, self.b = 1, 1 def __call__(self): r = self.a self.a, self.b = self.b, self.a + self.b return r assert [1,1,2,3,5] == ftake(fib_gen3(), 5)
From a programming viewpoint, the iterators are implemented as thunks
http://en.wikipedia.org/wiki/Thunk_(functional_programming)
To implement iterators/generators/thread pools for concurrent execution/etc as thunks (also called anonymous functions), one uses messages sent to a closure object, which has a dispatcher, and the dispatcher answers to "messages".
http://en.wikipedia.org/wiki/Message_passing
" next " is a message sent to a closure, created by " iter " call.
There are lots of ways to implement this computation. I used mutation but it is easy to do it without mutation, by returning the current value and the next yielder.
Here is a demonstration which uses the structure of R6RS but the semantics is absolutely identical as in python, it's the same model of computation, only a change in syntax is required to rewrite it in python.
Welcome to Racket v6.5.0.3. -> (define gen (lambda (l) (define yield (lambda () (if (null? l) 'END (let ((v (car l))) (set! l (cdr l)) v)))) (lambda(m) (case m ('yield (yield)) ('init (lambda (data) (set! l data) 'OK)))))) -> (define stream (gen '(1 2 3))) -> (stream 'yield) 1 -> (stream 'yield) 2 -> (stream 'yield) 3 -> (stream 'yield) 'END -> ((stream 'init) '(ab)) 'OK -> (stream 'yield) 'a -> (stream 'yield) 'b -> (stream 'yield) 'END -> (stream 'yield) 'END ->
这是一个简单的例子:
def isPrimeNumber(n): print "isPrimeNumber({}) call".format(n) if n==1: return False for x in range(2,n): if n % x == 0: return False return True def primes (n=1): while(True): print "loop step ---------------- {}".format(n) if isPrimeNumber(n): yield n n += 1 for n in primes(): if n> 10:break print "wiriting result {}".format(n)
输出:
loop step ---------------- 1 isPrimeNumber(1) call loop step ---------------- 2 isPrimeNumber(2) call loop step ---------------- 3 isPrimeNumber(3) call wiriting result 3 loop step ---------------- 4 isPrimeNumber(4) call loop step ---------------- 5 isPrimeNumber(5) call wiriting result 5 loop step ---------------- 6 isPrimeNumber(6) call loop step ---------------- 7 isPrimeNumber(7) call wiriting result 7 loop step ---------------- 8 isPrimeNumber(8) call loop step ---------------- 9 isPrimeNumber(9) call loop step ---------------- 10 isPrimeNumber(10) call loop step ---------------- 11 isPrimeNumber(11) call
I am not a Python developer, but it looks to me yield
holds the position of program flow and the next loop start from "yield" position. It seems like it is waiting at that position, and just before that, returning a value outside, and next time continues to work.
Seems to me an interesting and nice ability 😀
Here is a mental image of what yield
does.
I like to think of a thread as having a stack (even when it's not implemented that way).
When a normal function is called, it puts its local variables on the stack, does some computation, then clears the stack and returns. The values of its local variables are never seen again.
With a yield
function, when its code begins to run (ie after the function is called, returning a generator object, whose next()
method is then invoked), it similarly puts its local variables onto the stack and computes for a while. But then, when it hits the yield
statement, before clearing its part of the stack and returning, it takes a snapshot of its local variables and stores them in the generator object. It also writes down the place where it's currently up to in its code (ie the particular yield
statement).
So it's a kind of a frozen function that the generator is hanging onto.
When next()
is called subsequently, it retrieves the function's belongings onto the stack and re-animates it. The function continues to compute from where it left off, oblivious to the fact that it had just spent an eternity in cold storage.
Compare the following examples:
def normalFunction(): return if False: pass def yielderFunction(): return if False: yield 12
When we call the second function, it behaves very differently to the first. The yield
statement might be unreachable, but if it's present anywhere, it changes the nature of what we're dealing with.
>>> yielderFunction() <generator object yielderFunction at 0x07742D28>
Calling yielderFunction()
doesn't run its code, but makes a generator out of the code. (Maybe it's a good idea to name such things with the yielder
prefix for readability.)
>>> gen = yielderFunction() >>> dir(gen) ['__class__', ... '__iter__', #Returns gen itself, to make it work uniformly with containers ... #when given to a for loop. (Containers return an iterator instead.) 'close', 'gi_code', 'gi_frame', 'gi_running', 'next', #The method that runs the function's body. 'send', 'throw']
The gi_code
and gi_frame
fields are where the frozen state is stored. Exploring them with dir(..)
, we can confirm that our mental model above is credible.
Like every answer suggests, yield
is used for creating a sequence generator. It's used for generating some sequence dynamically. 例如。 While reading a file line by line on a network, you can use the yield
function as follows:
def getNextLines(): while con.isOpen(): yield con.read()
You can use it in your code as follows :
for line in getNextLines(): doSomeThing(line)
Execution Control Transfer gotcha
The execution control will be transferred from getNextLines() to the for loop when yield is executed. Thus, every time getNextLines() is invoked, execution begins from the point where it was paused last time.
Thus in short, a function with the following code
def simpleYield(): yield "first time" yield "second time" yield "third time" yield "Now some useful value {}".format(12) for i in simpleYield(): print i
将打印
"first time" "second time" "third time" "Now some useful value 12"
我希望这可以帮助你。
yield
is like a return element for a function. The difference is, that the yield
element turns a function into a generator. A generator behaves just like a function until something is 'yielded'. The generator stops until it is next called, and continues from exactly the same point as it started. You can get a sequence of all the 'yielded' values in one, by calling list(generator())
.
Yield is an Object
A return
in a function will return a single value.
If you want function to return huge set of values use yield
.
More importantly, yield
is a barrier
like Barrier in Cuda Language, it will not transfer control until it gets completed.
即
It will run the code in your function from the beginning until it hits yield
. Then, it'll return the first value of the loop. Then, every other call will run the loop you have written in the function one more time, returning the next value until there is no value to return.
(My below answer only speaks from the perspective of using Python generator, not the underlying implementation of generator mechanism , which involves some tricks of stack and heap manipulation.)
When yield
is used instead of a return
in a python function, that function is turned into something special called generator function
. That function will return an object of generator
type. The yield
keyword is a flag to notify the python compiler to treat such function specially. Normal functions will terminate once some value is returned from it. But with the help of the compiler, the generator function can be thought of as resumable. That is, the execution context will be restored and the execution will continue from last run. Until you explicitly call return, which will raise a StopIteration
exception (which is also part of the iterator protocol), or reach the end of the function. I found a lot of references about generator
but this one from the functional programming perspective
is the most digestable.
(Now I want to talk about the rationale behind generator
, and the iterator
based on my own understanding. I hope this can help you grasp the essential motivation of iterator and generator. Such concept shows up in other languages as well such as C#.)
As I understand, when we want to process a bunch of data, we usually first store the data somewhere and then process it one by one. But this intuitive approach is problematic. If the data volume is huge, it's expensive to store them as a whole beforehand. So instead of storing the data
itself directly, why not store some kind of metadata
indirectly, ie the logic how the data is computed
.
There are 2 approaches to wrap such metadata.
- The OO approach, we wrap the metadata
as a class
. This is the so-callediterator
who implements the iterator protocol (ie the__next__()
, and__iter__()
methods). This is also the commonly seen iterator design pattern . - The functional approach, we wrap the metadata
as a function
. This is the so-calledgenerator function
. But under the hood, the returnedgenerator object
stillIS-A
iterator because it also implements the iterator protocol.
Either way, an iterator is created, ie some object that can give you the data you want. The OO approach may be a bit complex. Anyway, which one to use is up to you.
The yield
keyword simply collects returning results. Think of yield
like return +=
Here's a simple yield
based approach, to compute the fibonacci series, explained:
def fib(limit=50): a, b = 0, 1 for i in range(limit): yield b a, b = b, a+b
When you enter this into your REPL and then try and call it, you'll get a mystifying result:
>>> fib() <generator object fib at 0x7fa38394e3b8>
This is because the presence of yield
signaled to Python that you want to create a generator , that is, an object that generates values on demand.
So, how do you generate these values? This can either be done directly by using the built-in function next
, or, indirectly by feeding it to a construct that consumes values.
Using the built-in next()
function, you directly invoke .next
/ __next__
, forcing the generator to produce a value:
>>> g = fib() >>> next(g) 1 >>> next(g) 1 >>> next(g) 2 >>> next(g) 3 >>> next(g) 5
Indirectly, if you provide fib
to a for
loop, a list
initializer, a tuple
initializer, or anything else that expects an object that generates/produces values, you'll "consume" the generator until no more values can be produced by it (and it returns):
results = [] for i in fib(30): # consumes fib results.append(i) # can also be accomplished with results = list(fib(30)) # consumes fib
Similarly, with a tuple
initializer:
>>> tuple(fib(5)) # consumes fib (1, 1, 2, 3, 5)
A generator differs from a function in the sense that it is lazy. It accomplishes this by maintaining it's local state and allowing you to resume whenever you need to.
When you first invoke fib
by calling it:
f = fib()
Python compiles the function, encounters the yield
keyword and simply returns a generator object back at you. Not very helpful it seems.
When you then request it generates the first value, directly or indirectly, it executes all statements that it finds, until it encounters a yield
, it then yields back the value you supplied to yield
and pauses. For an example that better demonstrates this, let's use some print
calls (replace with print "text"
if on Python 2):
def yielder(value): """ This is an infinite generator. Only use next on it """ while 1: print("I'm going to generate the value for you") print("Then I'll pause for a while") yield value print("Let's go through it again.")
Now, enter in the REPL:
>>> gen = yielder("Hello, yield!")
you have a generator object now waiting for a command for it to generate a value. Use next
and see what get's printed:
>>> next(gen) # runs until it finds a yield I'm going to generate the value for you Then I'll pause for a while 'Hello, yield!'
The unquoted results are what's printed. The quoted result is what is returned from yield
. Call next
again now:
>>> next(gen) # continues from yield and runs again Let's go through it again. I'm going to generate the value for you Then I'll pause for a while 'Hello, yield!'
The generator remembers it was paused at yield value
and resumes from there. The next message is printed and the search for the yield
statement to pause at it performed again (due to the while
loop).
Many people use return
rather than yield
but in some cases yield
can be more efficient and easier to work with.
Here is an example which yield
is definitely best for:
return (in function)
import random def return_dates(): dates = [] # with return you need to create a list then return it for i in range(5): date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"]) dates.append(date) return dates
yield (in function)
def yield_dates(): for i in range(5): date = random.choice(["1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"]) yield date # yield makes a generator automatically which works in a similar way, this is much more efficient
Calling functions
dates_list = return_dates() print(dates_list) for i in dates_list: print(i) dates_generator = yield_dates() print(dates_generator) for i in dates_generator: print(i)
Both functions do the same thing but yield
uses 3 lines instead of 5 and has one less variable to worry about.
This is the result from the code:
As you can see both functions do the same thing, the only difference is return_dates()
gives a list and yield_dates()
gives a generator
A real life example would be something like reading a file line by line or if you just want to make a generator
In summary, the yield
statement transforms your function into a factory that produces a special object called a generator
which wraps around the body of your original function. When the generator
is iterated, it executes your function until it reaches the next yield
then suspends execution and evaluates to the value passed to yield
. It repeats this process on each iteration until the path of execution exits the function. 例如;
def simple_generator(): yield 'one' yield 'two' yield 'three' for i in simple_generator(): print i
simply outputs ;
one two three
The power comes from using the generator with a loop that calculates a sequence, the generator executes the loop stopping each time to 'yield' the next result of the calculation, in this way it calculates a list on the fly, the benefit being the memory saved for especially large calculations
Say you wanted to create a your own range
function that produces an iterable range of numbers, you could do it like so,
def myRangeNaive(i): n = 0 range = [] while n < i: range.append(n) n = n + 1 return range
and use it like this;
for i in myRangeNaive(10): print i
but this is ineffecient because
- You create an array that you only use once (this wastes memory)
- This code actually loops over that array twice! 🙁
Luckily Guido and his team were generous enough to develop generators so we could just do this;
def myRangeSmart(i): n = 0 while n < i: yield n n = n + 1 return for i in myRangeSmart(10): print i
Now upon each iteration a function on the generator called next()
executes the function until it either reaches a 'yield' statement in which it stops and 'yields' the value or reaches the end of the function. In this case on the first call, next()
executes up to the yield statement and yield 'n', on the next call it will execute the increment statement, jump back to the 'while', evaluate it, and if true, it will stop and yield 'n' again, it will continue that way until the while condition returns false and the generator jumps to the end of the function.
The
yield
keyword
At a glance, the yield
statement is used to define generators, replacing the return
of a function to provide a result to its caller without destroying local variables. Unlike a function, where on each call it starts with new set of variables, a generator will resume the execution where it was left off.
About Python Generators Since the yield
keyword is only used with generators, it makes sense to recall the concept of generators first.
The idea of generators is to calculate a series of results one-by-one on demand (on the fly). In the simplest case, a generator can be used as a list, where each element is calculated lazily. Let's compare a list and a generator that do the same thing – return powers of two:
>>> # First, we define a list >>> the_list = [2**x for x in range(5)] >>> >>> # Type check: yes, it's a list >>> type(the_list) <class 'list'> >>> >>> # Iterate over items and print them >>> for element in the_list: ... print(element) ... 1 2 4 8 16 >>> >>> # How about the length? >>> len(the_list) 5 >>> >>> # Ok, now a generator. >>> # As easy as list comprehensions, but with '()' instead of '[]': >>> the_generator = (x+x for x in range(3)) >>> >>> # Type check: yes, it's a generator >>> type(the_generator) <class 'generator'> >>> >>> # Iterate over items and print them >>> for element in the_generator: ... print(element) ... 0 2 4 >>> >>> # Everything looks the same, but the length... >>> len(the_generator) Traceback (most recent call last): File "", line 1, in TypeError: object of type 'generator' has no len()
Iterating over the list and the generator looks completely the same. However, although the generator is iterable, it is not a collection and thus has no length. Collections (lists, tuples, sets, etc) keep all values in memory and we can access them whenever needed. A generator calculates the values on the fly and forgets them, so it does not have any overview about the own result set.
Generators are especially useful for memory-intensive tasks, where there is no need to keep all of the elements of a memory-heavy list accessible at the same time. Calculating a series of values one-by-one can also be useful in situations where the complete result is never needed, yielding intermediate results to the caller until some requirement is satisfied and further processing stops.
Using the Python
yield
keyword
A good example is a search task, where typically there is no need to wait for all results to be found. Performing a file-system search, a user would be happier to receive results on-the-fly, rather the wait for a search engine to go through every single file and only afterwards return results. Are there any people who really navigate through all Google search results until the last page?
Since a search functionality cannot be created using list-comprehensions, we are going to define a generator using a function with the yield statement/keyword. The yield instruction should be put into a place where the generator returns an intermediate result to the caller and sleeps until the next invocation occurs.
def search(keyword, filename): print('generator started') f = open(filename, 'r') # Looping through the file line by line for line in f: if keyword in line: # If keyword found, return it yield line f.close()
So far the most practical aspects of Python generators have been described. For more detailed info and an interesting discussion take a look at the Python Enhancement Proposal 255, which discusses the feature of the language in detail.
Happy Pythoning! For more info go to http://pythoncentral.io/python-generators-and-yield-keyword/
Yet another TL;DR
iterator on list : next()
returns the next element of the list
iterator generator : next()
will compute the next element on the fly (execute code)
You can see the yield/generator as a way to manually run the control flow from outside (like continue loop 1 step), by calling next, however complex the flow.
NOTE: the generator is NOT a normal function, it remembers previous state like local variables (stack), see other answers or articles for detailed explanation, the generator can only be iterated on once . You could do without yield
but it would not be as nice, so it can be considered 'very nice' language sugar.