从Python 3的网站下载文件
我正在创build一个程序,通过读取同一游戏/应用程序的.jad文件中指定的URL,从Web服务器下载.jar(java)文件。 我正在使用Python 3.2.1
我已经设法从JAD文件中提取JAR文件的URL(每个JAD文件都包含JAR文件的URL),但正如您可能想象的那样,提取的值是type()string。
这是相关的function:
def downloadFile(URL=None): import httplib2 h = httplib2.Http(".cache") resp, content = h.request(URL, "GET") return content downloadFile(URL_from_file)
但是,我总是得到一个错误,说上面的函数中的types必须是字节,而不是string。 我尝试过使用URL.encode('utf-8')以及字节(URL,encoding ='utf-8'),但我总是会得到相同或相似的错误。
所以基本上我的问题是如何从一个服务器下载URL时存储在一个stringtypes的文件?
如果你想获得一个网页的内容到一个variables,只要read
urllib.request.urlopen
的响应:
import urllib.request ... url = 'http://example.com/' response = urllib.request.urlopen(url) data = response.read() # a `bytes` object text = data.decode('utf-8') # a `str`; this step can't be used if data is binary
下载和保存文件的最简单的方法是使用urllib.request.urlretrieve
函数:
import urllib.request ... # Download the file from `url` and save it locally under `file_name`: urllib.request.urlretrieve(url, file_name)
import urllib.request ... # Download the file from `url`, save it in a temporary directory and get the # path to it (eg '/tmp/tmpb48zma.txt') in the `file_name` variable: file_name, headers = urllib.request.urlretrieve(url)
但请记住, urlretrieve
被认为是遗留的 ,可能会被弃用(不知道为什么,虽然)。
因此,最正确的方法是使用urllib.request.urlopen
函数返回一个表示HTTP响应的文件类对象,并使用shutil.copyfileobj
将其复制到一个真实文件中。
import urllib.request import shutil ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: shutil.copyfileobj(response, out_file)
如果这看起来太复杂了,你可能想简单一些,将整个下载文件存储在一个bytes
对象中,然后写入一个文件。 但是这只适用于小文件。
import urllib.request ... # Download the file from `url` and save it locally under `file_name`: with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file: data = response.read() # a `bytes` object out_file.write(data)
可以在运行中提取.gz
(也可能是其他格式)压缩数据,但是这样的操作可能需要HTTP服务器支持随机访问文件。
import urllib.request import gzip ... # Read the first 64 bytes of the file inside the .gz archive located at `url` url = 'http://example.com/something.gz' with urllib.request.urlopen(url) as response: with gzip.GzipFile(fileobj=response) as uncompressed: file_header = uncompressed.read(64) # a `bytes` object # Or do anything shown above using `uncompressed` instead of `response`.
我使用requests
包,只要我想要的东西与HTTP请求有关,因为它的API非常容易开始:
首先,安装requests
$ pip install requests
那么代码:
from requests import get # to make GET request def download(url, file_name): # open in binary mode with open(file_name, "wb") as file: # get request response = get(url) # write to file file.write(response.content)
我希望我明白这个问题的正确性,即:当URL以stringtypes存储时,如何从服务器下载文件?
我使用下面的代码下载文件并保存在本地:
import requests url = 'static/img/python-logo.png' fileName = 'D:\Python\dwnldPythonLogo.png' req = requests.get(url) file = open(fileName, 'wb') for chunk in req.iter_content(100000): file.write(chunk) file.close()
from urllib import request def get(url): with request.urlopen(url) as r: return r.read() def download(url, file=None): if not file: file = url.split('/')[-1] with open(file, 'wb') as f: f.write(get(url))