读取csv文件并在Python中返回data.frame

我有一个CSV文件, "value.txt"与以下内容:该文件的前几行是:

 Date,"price","factor_1","factor_2" 2012-06-11,1600.20,1.255,1.548 2012-06-12,1610.02,1.258,1.554 2012-06-13,1618.07,1.249,1.552 2012-06-14,1624.40,1.253,1.556 2012-06-15,1626.15,1.258,1.552 2012-06-16,1626.15,1.263,1.558 2012-06-17,1626.15,1.264,1.572 

在R中我们可以在使用中读取这个文件

 price <- read.csv("value.txt") 

这将返回一个data.frame我可以用于统计操作:

 > price <- read.csv("value.txt") > price Date price factor_1 factor_2 1 2012-06-11 1600.20 1.255 1.548 2 2012-06-12 1610.02 1.258 1.554 3 2012-06-13 1618.07 1.249 1.552 4 2012-06-14 1624.40 1.253 1.556 5 2012-06-15 1626.15 1.258 1.552 6 2012-06-16 1626.15 1.263 1.558 7 2012-06-17 1626.15 1.264 1.572 

是否有Pythonic方式获得相同的function?

大pandas的救援:

 import pandas as pd print pd.read_csv('value.txt') Date price factor_1 factor_2 0 2012-06-11 1600.20 1.255 1.548 1 2012-06-12 1610.02 1.258 1.554 2 2012-06-13 1618.07 1.249 1.552 3 2012-06-14 1624.40 1.253 1.556 4 2012-06-15 1626.15 1.258 1.552 5 2012-06-16 1626.15 1.263 1.558 6 2012-06-17 1626.15 1.264 1.572 

这返回类似于R'spandasDataFrame 。

这是使用Python内置csv模块的pandas库的替代scheme。

 import csv from pprint import pprint with open('foo.csv', 'rb') as f: reader = csv.reader(f) headers = reader.next() column = {h:[] for h in headers} for row in reader: for h, v in zip(headers, row): column[h].append(v) pprint(column) # Pretty printer 

将打印

 {'Date': ['2012-06-11', '2012-06-12', '2012-06-13', '2012-06-14', '2012-06-15', '2012-06-16', '2012-06-17'], 'factor_1': ['1.255', '1.258', '1.249', '1.253', '1.258', '1.263', '1.264'], 'factor_2': ['1.548', '1.554', '1.552', '1.556', '1.552', '1.558', '1.572'], 'price': ['1600.20', '1610.02', '1618.07', '1624.40', '1626.15', '1626.15', '1626.15']} 

您可以使用python标准库中的csv模块来操作CSV文件。

例:

 import csv with open('some.csv', 'rb') as f: reader = csv.reader(f) for row in reader: print row 

注意到很干净,但是:

 import csv with open("value.txt", "r") as f: csv_reader = reader(f) num = ' ' for row in csv_reader: print num, '\t'.join(row) if num == ' ': num=0 num=num+1 

虽然不是很紧凑,但却能完成这项工作:

  Date price factor_1 factor_2 1 2012-06-11 1600.20 1.255 1.548 2 2012-06-12 1610.02 1.258 1.554 3 2012-06-13 1618.07 1.249 1.552 4 2012-06-14 1624.40 1.253 1.556 5 2012-06-15 1626.15 1.258 1.552 6 2012-06-16 1626.15 1.263 1.558 7 2012-06-17 1626.15 1.264 1.572