如何从Python的URL读取CSV文件?
当我做一个API调用链接http://domain.com/passkey=wedsmdjsjmdd
curl 'http://domain.com/passkey=wedsmdjsjmdd'
我以csv文件格式获取员工输出数据,如:
"Steve","421","0","421","2","","","","","","","","","421","0","421","2"
怎么能通过这个使用pythonparsing。
我试过了:
import csv cr = csv.reader(open('http://domain.com/passkey=wedsmdjsjmdd',"rb")) for row in cr: print row
但它没有工作,我得到一个错误
http://domain.com/passkey=wedsmdjsjmdd No such file or directory:
谢谢!
您需要使用urllib.urlopen或urllib2.urlopenreplaceopen
。
例如
import csv import urllib2 url = 'http://winterolympicsmedals.com/medals.csv' response = urllib2.urlopen(url) cr = csv.reader(response) for row in cr: print row
这将输出以下内容
Year,City,Sport,Discipline,NOC,Event,Event gender,Medal 1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver 1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold ...
使用pandas,直接从url读取csv文件是非常简单的
import pandas as pd data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')
这将以表格格式读取您的数据,这将是非常容易处理的
你也可以使用请求模块来做到这一点:
url = 'http://winterolympicsmedals.com/medals.csv' r = requests.get(url) text = r.iter_lines() reader = csv.reader(text, delimiter=',')
为了提高下载大文件时的性能,下面可能会更有效一些:
import requests from contextlib import closing import csv url = "http://download-and-process-csv-efficiently/python.csv" with closing(requests.get(url, stream=True)) as r: reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"') for row in reader: # Handle each row here... print row
通过在GET请求中设置stream=True
,当我们将r.iter_lines()
传递给csv.reader()时,我们将一个生成器传递给csv.reader()。 通过这样做,我们使csv.reader()能够for row in reader
,对响应中的每一行进行延迟迭代。
这可避免在开始处理整个文件之前将整个文件加载到内存中,从而大大减less了大文件的内存开销。
Google工作表提供文件下的解决scheme – >发布到networking – >和链接下,您可以创build一个链接,自动下载csv。 说明和截图也可以在这里find: http : //support.aftership.com/article/141-csv-auto-fetch-using-google-drive-spreadsheet