Python：HTTP使用stream发布大文件

我将潜在的大file upload到Web服务器。目前我正在这样做：

import urllib2 f = open('somelargefile.zip','rb') request = urllib2.Request(url,f.read()) request.add_header("Content-Type", "application/zip") response = urllib2.urlopen(request)

但是，这会在发布之前将整个文件的内容读入内存。我怎样才能将文件stream式传输到服务器？

阅读systempuntoout链接的邮件列表线程，我发现了解决scheme的线索。

mmap模块允许您打开像string一样的文件。文件的一部分按需加载到内存中。

这里是我现在使用的代码：

 import urllib2 import mmap # Open the file as a memory mapped string. Looks like a string, but # actually accesses the file behind the scenes. f = open('somelargefile.zip','rb') mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) # Do the request request = urllib2.Request(url, mmapped_file_as_string) request.add_header("Content-Type", "application/zip") response = urllib2.urlopen(request) #close everything mmapped_file_as_string.close() f.close()

你尝试过机械化吗？

 from mechanize import Browser br = Browser() br.open(url) br.form.add_file(open('largefile.zip'), 'application/zip', 'largefile.zip') br.submit()

或者，如果您不想使用multipart / form-data，请检查这个旧post。

它提出了两个select：

  1. Use mmap, Memory Mapped file object 2. Patch httplib.HTTPConnection.send

文档没有说你可以这样做，但urllib2（和httplib）中的代码接受任何具有read（）方法的对象作为数据。所以使用一个打开的文件似乎是诀窍。

你需要自己设置Content-Length头。如果没有设置，urllib2将调用数据的len（），哪些文件对象不支持。

 import os.path import urllib2 data = open(filename, 'r') headers = { 'Content-Length' : os.path.getsize(filename) } response = urllib2.urlopen(url, data, headers)

这是处理您提供的数据的相关代码。它来自Python 2.7中的httplib.py中的HTTPConnection类：

 def send(self, data): """Send `data' to the server.""" if self.sock is None: if self.auto_open: self.connect() else: raise NotConnected() if self.debuglevel > 0: print "send:", repr(data) blocksize = 8192 if hasattr(data,'read') and not isinstance(data, array): if self.debuglevel > 0: print "sendIng a read()able" datablock = data.read(blocksize) while datablock: self.sock.sendall(datablock) datablock = data.read(blocksize) else: self.sock.sendall(data)

尝试pycurl。我没有任何设置可以接受不在 multipart / form-data POST中的大文件，但是这里有一个简单的例子，可以根据需要读取文件。

 import os import pycurl class FileReader: def __init__(self, fp): self.fp = fp def read_callback(self, size): return self.fp.read(size) c = pycurl.Curl() c.setopt(pycurl.URL, url) c.setopt(pycurl.UPLOAD, 1) c.setopt(pycurl.READFUNCTION, FileReader(open(filename, 'rb')).read_callback) filesize = os.path.getsize(filename) c.setopt(pycurl.INFILESIZE, filesize) c.perform() c.close()

使用requests库你可以做

 with open('massive-body', 'rb') as f: requests.post('http://some.url/streamed', data=f)

正如他们的文档中提到的那样

Python：HTTP使用stream发布大文件

对于需要SSL / TLS的请求发送适当的HTTP响应是什么？

在Java中发送HTTP POST请求

正确的REST响应为空表？

HTTP状态码0是什么意思

http HEAD vs GET性能

使用匹配器组方法时“找不到匹配”

从Ustream或Qik上传iPhone直播video

错误：没有指定默认引擎，也没有提供扩展名

不同浏览器中URL的最大长度是多less？

如何指定HTTP错误代码？