asynchronous请求与Python请求
我尝试了Python的请求库的文档中提供的示例:
http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests
与async.map(rs)
我得到的响应代码,但我想获得每个页面的内容请求。
out = async.map(rs) print out[0].content
例如只是不工作。
注意
以下答案不适用于请求v0.13.0 +。 写完这个问题后,asynchronousfunction被转移到了grequests 。 不过,你可以用下面的grequests
replacerequests
,它应该可以工作。
我已经离开了这个答案,以反映原来的问题是关于使用请求<v0.13.0。
要asynchronous执行async.map
多个任务,您必须:
- 为每个对象定义一个函数(你的任务)
- 将该函数添加为请求中的事件挂钩
- 在所有请求/操作的列表中调用
async.map
例:
from requests import async # If using requests > v0.13.0, use # from grequests import async urls = [ 'http://python-requests.org', 'http://httpbin.org', 'http://python-guide.org', 'http://kennethreitz.com' ] # A simple task to do to each response object def do_something(response): print response.url # A list to hold our things to do via async async_list = [] for u in urls: # The "hooks = {..." part is where you define what you want to do # # Note the lack of parentheses following do_something, this is # because the response will be used as the first argument automatically action_item = async.get(u, hooks = {'response' : do_something}) # Add the task to our list of things to do via async async_list.append(action_item) # Do our list of things to do via async async.map(async_list)
async
现在是一个独立的模块: grequests
。
看到这里: https : //github.com/kennethreitz/grequests
那里: 通过Python发送多个HTTP请求的理想方法?
安装:
$ pip install grequests
用法:
build立一个堆栈:
import grequests urls = [ 'http://www.heroku.com', 'http://tablib.org', 'http://httpbin.org', 'http://python-requests.org', 'http://kennethreitz.com' ] rs = (grequests.get(u) for u in urls)
发送堆栈
grequests.map(rs)
结果看起来像
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
grequests似乎没有设置对并发请求的限制,即当多个请求被发送到同一个服务器。
也许请求期货是另一种select。
from requests_futures.sessions import FuturesSession session = FuturesSession() # first request is started in background future_one = session.get('http://httpbin.org/get') # second requests is started immediately future_two = session.get('http://httpbin.org/get?foo=bar') # wait for the first request to complete, if it hasn't already response_one = future_one.result() print('response one status: {0}'.format(response_one.status_code)) print(response_one.content) # wait for the second request to complete, if it hasn't already response_two = future_two.result() print('response two status: {0}'.format(response_two.status_code)) print(response_two.content)
这也是build议在办公文件中 。 如果你不想涉及到gevent,这是一个很好的。
我testing了请求 – 期货和grequests 。 Grequests更快,但带来了猴子补丁和依赖的附加问题。 请求 – 期货比grequests慢几倍。 我决定写我自己的,简单的包装请求到ThreadPollExecutor,它几乎和grequests一样快,但没有外部依赖。
import requests import concurrent.futures def get_urls(): return ["url1","url2"] def load_url(url, timeout): return requests.get(url, timeout = timeout) with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor: future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: resp_err = resp_err + 1 else: resp_ok = resp_ok + 1
我知道这已经closures了一段时间,但我认为这可能是有用的,促进另一个asynchronous解决scheme构build在请求库。
list_of_requests = ['http://moop.com', 'http://doop.com', ...] from simple_requests import Requests for response in Requests().swarm(list_of_requests): print response.content
文档在这里: http : //pythonhosted.org/simple-requests/
threads=list() for requestURI in requests: t = Thread(target=self.openURL, args=(requestURI,)) t.start() threads.append(t) for thread in threads: thread.join() ... def openURL(self, requestURI): o = urllib2.urlopen(requestURI, timeout = 600) o...
我一直在使用python请求asynchronous调用github的主要API。
有关示例,请参阅此处的代码:
https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72
这种python风格可能不是最明显的例子,但我可以向你保证,代码的作品。 让我知道这是否让你感到困惑,我会logging下来。
我也尝试了一些使用Python中的asynchronous方法的东西,如何使用双向asynchronous编程有更好的运气。 它有较less的问题,并有据可查。 这是一个类似于你正在扭曲的东西的链接。
http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html