如何将CSV文件读入.NET Datatable
如何将CSV文件加载到System.Data.DataTable
,基于CSV文件创build数据表?
常规的ADO.netfunction是否允许这样做?
这是一个很好的类,它将使用数据结构将CSV数据复制到数据表中以创buildDataTable:
用于平面文件的便携且高效的通用分析器
configuration简单,易于使用。 我敦促你看看。
我一直在使用OleDb
提供程序。 但是,如果您正在阅读具有数字值的行,但希望将它们视为文本,则会出现问题。 但是,您可以通过创build一个schema.ini
文件来解决该问题。 这是我使用的方法:
// using System.Data; // using System.Data.OleDb; // using System.Globalization; // using System.IO; static DataTable GetDataTableFromCsv(string path, bool isFirstRowHeader) { string header = isFirstRowHeader ? "Yes" : "No"; string pathOnly = Path.GetDirectoryName(path); string fileName = Path.GetFileName(path); string sql = @"SELECT * FROM [" + fileName + "]"; using(OleDbConnection connection = new OleDbConnection( @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathOnly + ";Extended Properties=\"Text;HDR=" + header + "\"")) using(OleDbCommand command = new OleDbCommand(sql, connection)) using(OleDbDataAdapter adapter = new OleDbDataAdapter(command)) { DataTable dataTable = new DataTable(); dataTable.Locale = CultureInfo.CurrentCulture; adapter.Fill(dataTable); return dataTable; } }
我决定使用Sebastien Lorion的Csv Reader 。
Jay Riggs的build议也是一个很好的解决scheme,但我并不需要Andrew Rissing的Generic Parser提供的所有function。
UDPATE 10/25/2010
在我的项目中使用了Sebastien Lorion的Csv Reader近一年半之后,我发现它在parsing一些我认为是格式良好的csv文件时会抛出exception。
所以,我确实转向了安德鲁·里辛(Andrew Rissing)的generics语法分析器(Generic Parser) ,看起来好多了。
UDPATE 9/22/2014
现在,我主要使用这种扩展方法来阅读分隔文本:
https://www.nuget.org/packages/CoreTechs.Common/
UDPATE 2/20/2015
例:
var csv = @"Name, Age Ronnie, 30 Mark, 40 Ace, 50"; TextReader reader = new StringReader(csv); var table = new DataTable(); using(var it = reader.ReadCsvWithHeader().GetEnumerator()) { if (!it.MoveNext()) return; foreach (var k in it.Current.Keys) table.Columns.Add(k); do { var row = table.NewRow(); foreach (var k in it.Current.Keys) row[k] = it.Current[k]; table.Rows.Add(row); } while (it.MoveNext()); }
嘿,它的工作100%
public static DataTable ConvertCSVtoDataTable(string strFilePath) { DataTable dt = new DataTable(); using (StreamReader sr = new StreamReader(strFilePath)) { string[] headers = sr.ReadLine().Split(','); foreach (string header in headers) { dt.Columns.Add(header); } while (!sr.EndOfStream) { string[] rows = sr.ReadLine().Split(','); DataRow dr = dt.NewRow(); for (int i = 0; i < headers.Length; i++) { dr[i] = rows[i]; } dt.Rows.Add(dr); } } return dt; }
CSV图像
数据表导入
我们总是习惯使用Jet.OLEDB驱动程序,直到我们开始使用64位应用程序。 微软并没有也不会发布64位的Jet驱动程序。 这里有一个简单的解决scheme,它使用File.ReadAllLines和String.Split来读取和parsingCSV文件并手动加载一个DataTable。 如上所述,它不处理其中一个列值包含逗号的情况。 我们主要使用它来读取自定义configuration文件 – 关于使用CSV文件的好处是我们可以在Excel中编辑它们。
string CSVFilePathName = @"C:\test.csv"; string[] Lines = File.ReadAllLines(CSVFilePathName); string[] Fields; Fields = Lines[0].Split(new char[] { ',' }); int Cols = Fields.GetLength(0); DataTable dt = new DataTable(); //1st row must be column names; force lower case to ensure matching later on. for (int i = 0; i < Cols; i++) dt.Columns.Add(Fields[i].ToLower(), typeof(string)); DataRow Row; for (int i = 1; i < Lines.GetLength(0); i++) { Fields = Lines[i].Split(new char[] { ',' }); Row = dt.NewRow(); for (int f = 0; f < Cols; f++) Row[f] = Fields[f]; dt.Rows.Add(Row); }
这是我使用它的代码,但你的应用程序必须与净版本3.5运行
private void txtRead_Click(object sender, EventArgs e) { // var filename = @"d:\shiptest.txt"; openFileDialog1.InitialDirectory = "d:\\"; openFileDialog1.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*"; DialogResult result = openFileDialog1.ShowDialog(); if (result == DialogResult.OK) { if (openFileDialog1.FileName != "") { var reader = ReadAsLines(openFileDialog1.FileName); var data = new DataTable(); //this assume the first record is filled with the column names var headers = reader.First().Split(','); foreach (var header in headers) { data.Columns.Add(header); } var records = reader.Skip(1); foreach (var record in records) { data.Rows.Add(record.Split(',')); } dgList.DataSource = data; } } } static IEnumerable<string> ReadAsLines(string filename) { using (StreamReader reader = new StreamReader(filename)) while (!reader.EndOfStream) yield return reader.ReadLine(); }
public DataTable CsvFileToDatatable(string path, bool IsFirstRowHeader)//here Path is root of file and IsFirstRowHeader is header is there or not { string header = "No"; string sql = string.Empty; DataTable dataTable = null; string pathOnly = string.Empty; string fileName = string.Empty; try { pathOnly = Path.GetDirectoryName(path); fileName = Path.GetFileName(path); sql = @"SELECT * FROM [" + fileName + "]"; if (IsFirstRowHeader) { header = "Yes"; } using (OleDbConnection connection = new OleDbConnection( @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + pathOnly + ";Extended Properties=\"Text;HDR=" + header + "\"")) { using (OleDbCommand command = new OleDbCommand(sql, connection)) { using (OleDbDataAdapter adapter = new OleDbDataAdapter(command)) { dataTable = new DataTable(); dataTable.Locale = CultureInfo.CurrentCulture; adapter.Fill(dataTable); } } } } finally { } return dataTable; }
我碰到这段代码使用Linq和正则expression式来parsing一个CSV文件。 引用文章现在已经超过一年半了,但还没有遇到一个更好的方法来parsing一个CSV使用Linq(和正则expression式)比这个。 警告是这里应用的正则expression式是逗号分隔的文件(将检测引号内的逗号!),它可能不会很好的头,但有一种方法来克服这些)。 走高峰:
Dim lines As String() = System.IO.File.ReadAllLines(strCustomerFile) Dim pattern As String = ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" Dim r As System.Text.RegularExpressions.Regex = New System.Text.RegularExpressions.Regex(pattern) Dim custs = From line In lines _ Let data = r.Split(line) _ Select New With {.custnmbr = data(0), _ .custname = data(1)} For Each cust In custs strCUSTNMBR = Replace(cust.custnmbr, Chr(34), "") strCUSTNAME = Replace(cust.custname, Chr(34), "") Next
对于那些希望不使用外部库的人,不要使用OleDB,请参阅下面的示例。 我发现的一切都是OleDB,外部库,或根据逗号分割! 对于我的情况OleDB不工作,所以我想要不同的东西。
我发现了一篇由MarkJ引用的Microsoft.VisualBasic.FileIO.TextFieldParser方法的文章。 这篇文章是用VB写的,不会返回一个数据表,所以看下面的例子。
public static DataTable LoadCSV(string path, bool hasHeader) { DataTable dt = new DataTable(); using (var MyReader = new Microsoft.VisualBasic.FileIO.TextFieldParser(path)) { MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited; MyReader.Delimiters = new String[] { "," }; string[] currentRow; //'Loop through all of the fields in the file. //'If any lines are corrupt, report an error and continue parsing. bool firstRow = true; while (!MyReader.EndOfData) { try { currentRow = MyReader.ReadFields(); //Add the header columns if (hasHeader && firstRow) { foreach (string c in currentRow) { dt.Columns.Add(c, typeof(string)); } firstRow = false; continue; } //Create a new row DataRow dr = dt.NewRow(); dt.Rows.Add(dr); //Loop thru the current line and fill the data out for(int c = 0; c < currentRow.Count(); c++) { dr[c] = currentRow[c]; } } catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex) { //Handle the exception here } } } return dt; }
您可以实现它通过在C#中使用Microsoft.VisualBasic.FileIO.TextFieldParser DLL
static void Main() { string csv_file_path=@"C:\Users\Administrator\Desktop\test.csv"; DataTable csvData = GetDataTabletFromCSVFile(csv_file_path); Console.WriteLine("Rows count:" + csvData.Rows.Count); Console.ReadLine(); } private static DataTable GetDataTabletFromCSVFile(string csv_file_path) { DataTable csvData = new DataTable(); try { using(TextFieldParser csvReader = new TextFieldParser(csv_file_path)) { csvReader.SetDelimiters(new string[] { "," }); csvReader.HasFieldsEnclosedInQuotes = true; string[] colFields = csvReader.ReadFields(); foreach (string column in colFields) { DataColumn datecolumn = new DataColumn(column); datecolumn.AllowDBNull = true; csvData.Columns.Add(datecolumn); } while (!csvReader.EndOfData) { string[] fieldData = csvReader.ReadFields(); //Making empty value as null for (int i = 0; i < fieldData.Length; i++) { if (fieldData[i] == "") { fieldData[i] = null; } } csvData.Rows.Add(fieldData); } } } catch (Exception ex) { } return csvData; }
由ChuckBevitt先生修改
工作scheme:
string CSVFilePathName = APP_PATH + "Facilities.csv"; string[] Lines = File.ReadAllLines(CSVFilePathName); string[] Fields; Fields = Lines[0].Split(new char[] { ',' }); int Cols = Fields.GetLength(0); DataTable dt = new DataTable(); //1st row must be column names; force lower case to ensure matching later on. for (int i = 0; i < Cols-1; i++) dt.Columns.Add(Fields[i].ToLower(), typeof(string)); DataRow Row; for (int i = 0; i < Lines.GetLength(0)-1; i++) { Fields = Lines[i].Split(new char[] { ',' }); Row = dt.NewRow(); for (int f = 0; f < Cols-1; f++) Row[f] = Fields[f]; dt.Rows.Add(Row); }
这是一个使用ADO.Net的ODBC文本驱动程序的解决scheme:
Dim csvFileFolder As String = "C:\YourFileFolder" Dim csvFileName As String = "YourFile.csv" 'Note that the folder is specified in the connection string, 'not the file. That's specified in the SELECT query, later. Dim connString As String = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" _ & csvFileFolder & ";Extended Properties=""Text;HDR=No;FMT=Delimited""" Dim conn As New Odbc.OdbcConnection(connString) 'Open a data adapter, specifying the file name to load Dim da As New Odbc.OdbcDataAdapter("SELECT * FROM [" & csvFileName & "]", conn) 'Then fill a data table, which can be bound to a grid Dim dt As New DataTableda.Fill(dt) grdCSVData.DataSource = dt
一旦填充完毕,就可以对数据表的属性(如ColumnName)进行赋值,以充分利用ADO.Net数据对象的所有function。
在VS2008中,你可以使用Linq来达到同样的效果。
注意:这可能是这个 SO问题的重复。
public class Csv { public static DataTable DataSetGet(string filename, string separatorChar, out List<string> errors) { errors = new List<string>(); var table = new DataTable("StringLocalization"); using (var sr = new StreamReader(filename, Encoding.Default)) { string line; var i = 0; while (sr.Peek() >= 0) { try { line = sr.ReadLine(); if (string.IsNullOrEmpty(line)) continue; var values = line.Split(new[] {separatorChar}, StringSplitOptions.None); var row = table.NewRow(); for (var colNum = 0; colNum < values.Length; colNum++) { var value = values[colNum]; if (i == 0) { table.Columns.Add(value, typeof (String)); } else { row[table.Columns[colNum]] = value; } } if (i != 0) table.Rows.Add(row); } catch(Exception ex) { errors.Add(ex.Message); } i++; } } return table; } }
我发现的最好的select,它解决了您可能有不同版本的Office安装的问题,也提到像Chuck Bevitt 32/64-bit问题是FileHelpers 。
它可以添加到您的项目使用NuGet的参考,它提供了一个单线解决scheme:
CommonEngine.CsvToDataTable(path, "ImportRecord", ',', true);
不能拒绝添加我自己的旋转。 这比我以前用过的更好,更紧凑。
此解决scheme:
- 不依赖于数据库驱动程序或第三方库。
- 不会失败重复的列名称
- 处理数据中的逗号
- 处理任何分隔符,不只是逗号(虽然这是默认)
以下是我想到的:
Public Function ToDataTable(FileName As String, Optional Delimiter As String = ",") As DataTable ToDataTable = New DataTable Using TextFieldParser As New Microsoft.VisualBasic.FileIO.TextFieldParser(FileName) With {.HasFieldsEnclosedInQuotes = True, .TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited, .TrimWhiteSpace = True} With TextFieldParser .SetDelimiters({Delimiter}) .ReadFields.ToList.Unique.ForEach(Sub(x) ToDataTable.Columns.Add(x)) ToDataTable.Columns.Cast(Of DataColumn).ToList.ForEach(Sub(x) x.AllowDBNull = True) Do Until .EndOfData ToDataTable.Rows.Add(.ReadFields.Select(Function(x) Text.BlankToNothing(x)).ToArray) Loop End With End Using End Function
它取决于一个扩展方法( Unique
)来处理重复的列名称作为我的答案如何将附加唯一的数字string列表
这里是BlankToNothing
帮助函数:
Public Function BlankToNothing(ByVal Value As String) As Object If String.IsNullOrEmpty(Value) Then Return Nothing Return Value End Function
非常基本的答案:如果你没有一个复杂的csv,可以使用一个简单的拆分function,这将适用于导入(注意这导入为string,我做数据types转换,如果我需要以后)
private DataTable csvToDataTable(string fileName, char splitCharacter) { StreamReader sr = new StreamReader(fileName); string myStringRow = sr.ReadLine(); var rows = myStringRow.Split(splitCharacter); DataTable CsvData = new DataTable(); foreach (string column in rows) { //creates the columns of new datatable based on first row of csv CsvData.Columns.Add(column); } myStringRow = sr.ReadLine(); while (myStringRow != null) { //runs until string reader returns null and adds rows to dt rows = myStringRow.Split(splitCharacter); CsvData.Rows.Add(rows); myStringRow = sr.ReadLine(); } sr.Close(); sr.Dispose(); return CsvData; }
我的方法,如果我用一个string[] separater导入表,并处理当前行我读的问题可能已经到了csv或文本文件中的下一行< – 在这种情况下,我想要循环,直到我得到到第一行(列)的总行数
public static DataTable ImportCSV(string fullPath, string[] sepString) { DataTable dt = new DataTable(); using (StreamReader sr = new StreamReader(fullPath)) { //stream uses using statement because it implements iDisposable string firstLine = sr.ReadLine(); var headers = firstLine.Split(sepString, StringSplitOptions.None); foreach (var header in headers) { //create column headers dt.Columns.Add(header); } int columnInterval = headers.Count(); string newLine = sr.ReadLine(); while (newLine != null) { //loop adds each row to the datatable var fields = newLine.Split(sepString, StringSplitOptions.None); // csv delimiter var currentLength = fields.Count(); if (currentLength < columnInterval) { while (currentLength < columnInterval) { //if the count of items in the row is less than the column row go to next line until count matches column number total newLine += sr.ReadLine(); currentLength = newLine.Split(sepString, StringSplitOptions.None).Count(); } fields = newLine.Split(sepString, StringSplitOptions.None); } if (currentLength > columnInterval) { //ideally never executes - but if csv row has too many separators, line is skipped newLine = sr.ReadLine(); continue; } dt.Rows.Add(fields); newLine = sr.ReadLine(); } sr.Close(); } return dt; }