导入具有混合数据types的CSV文件

我正在使用MATLAB几天，我很难导入CSV文件到matrix。

我的问题是我的CSV文件几乎只包含string和一些整数值，所以csvread()不起作用。 csvread()只能与整数值一起得到。

如何将我的string存储在某种二维数组中以便可以自由访问每个元素？

以下是我的需求的CSV示例：

 04;abc;def;ghj;klm;;;;; ;;;;;Test;text;0xFF;; ;;;;;asdfhsdf;dsafdsag;0x0F0F;;

主要是空单元格和单元格内的文本。如你所见，结构可能会有所不同。

现在编辑已经更新了一个示例input文件的问题…

如果您知道CSV文件中会有多less列数据，那么像Amro一样简单地给TEXTSCAN打电话会是您最好的解决scheme。

但是，如果您事先不知道文件中有多less列，则可以使用一种更通用的方法，就像我在以下函数中所做的那样。我首先使用函数FGETL将文件的每一行读入一个单元格数组中。然后我使用函数TEXTSCAN使用预定义的字段分隔符将每行parsing为单独的string，并将整数字段视为现在的string（稍后可以将其转换为数字值）。下面是生成的代码，放在函数read_mixed_csv ：

 function lineArray = read_mixed_csv(fileName,delimiter) fid = fopen(fileName,'r'); %# Open the file lineArray = cell(100,1); %# Preallocate a cell array (ideally slightly %# larger than is needed) lineIndex = 1; %# Index of cell to place the next line in nextLine = fgetl(fid); %# Read the first line from the file while ~isequal(nextLine,-1) %# Loop while not at the end of the file lineArray{lineIndex} = nextLine; %# Add the line to the cell array lineIndex = lineIndex+1; %# Increment the line index nextLine = fgetl(fid); %# Read the next line from the file end fclose(fid); %# Close the file lineArray = lineArray(1:lineIndex-1); %# Remove empty cells, if needed for iLine = 1:lineIndex-1 %# Loop over lines lineData = textscan(lineArray{iLine},'%s',... %# Read strings 'Delimiter',delimiter); lineData = lineData{1}; %# Remove cell encapsulation if strcmp(lineArray{iLine}(end),delimiter) %# Account for when the line lineData{end+1} = ''; %# ends with a delimiter end lineArray(iLine,1:numel(lineData)) = lineData; %# Overwrite line data end end

在问题的示例文件内容上运行这个函数给出了这个结果：

 >> data = read_mixed_csv('myfile.csv',';') data = Columns 1 through 7 '04' 'abc' 'def' 'ghj' 'klm' '' '' '' '' '' '' '' 'Test' 'text' '' '' '' '' '' 'asdfhsdf' 'dsafdsag' Columns 8 through 10 '' '' '' '0xFF' '' '' '0x0F0F' '' ''

结果是一个3乘10的单元格数组，每个单元格有一个字段，其中缺less的字段由空string'' 。现在，您可以访问每个单元格或单元格的组合，以便按照您的喜好进行格式化。例如，如果要将第一列中的字段从string更改为整数值，则可以使用STR2DOUBLE函数，如下所示：

 >> data(:,1) = cellfun(@(s) {str2double(s)},data(:,1)) data = Columns 1 through 7 [ 4] 'abc' 'def' 'ghj' 'klm' '' '' [NaN] '' '' '' '' 'Test' 'text' [NaN] '' '' '' '' 'asdfhsdf' 'dsafdsag' Columns 8 through 10 '' '' '' '0xFF' '' '' '0x0F0F' '' ''

请注意，空字段的结果是NaN值。

鉴于你发布的样本，这个简单的代码应该做的工作：

 fid = fopen('file.csv','r'); C = textscan(fid, repmat('%s',1,10), 'delimiter',';', 'CollectOutput',true); C = C{1}; fclose(fid);

然后你可以根据它们的types来格式化这些列。例如，如果第一列是整数，我们可以这样来格式化：

 C(:,1) = num2cell( str2double(C(:,1)) )

同样，如果您希望将第十八列从hex转换为十进制，则可以使用HEX2DEC：

 C(:,8) = cellfun(@hex2dec, strrep(C(:,8),'0x',''), 'UniformOutput',false);

生成的单元格数组如下所示：

 C = [ 4] 'abc' 'def' 'ghj' 'klm' '' '' [] '' '' [NaN] '' '' '' '' 'Test' 'text' [ 255] '' '' [NaN] '' '' '' '' 'asdfhsdf' 'dsafdsag' [3855] '' ''

在R2013b或更高版本中，您可以使用表格：

 >> table = readtable('myfile.txt','Delimiter',';','ReadVariableNames',false) >> table = Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 ____ _____ _____ _____ _____ __________ __________ ________ ____ _____ 4 'abc' 'def' 'ghj' 'klm' '' '' '' NaN NaN NaN '' '' '' '' 'Test' 'text' '0xFF' NaN NaN NaN '' '' '' '' 'asdfhsdf' 'dsafdsag' '0x0F0F' NaN NaN

这里是更多的信息。

您是否尝试过使用文件交换中的“CSVIMPORT”function？我没有自己尝试过，但它声称处理文本和数字的所有组合。

http://www.mathworks.com/matlabcentral/fileexchange/23573-csvimport

使用xlsread，它在.csv文件上的工作就像在.xls文件上一样。指定您需要三个输出：

 [num char raw] = xlsread('your_filename.csv')

它将给你一个只包含数字数据（num）的数组，只包含字符数据（char）的数组和包含与.csv布局（raw）相同格式的所有数据types的数组。

根据文件的格式， importdata可能会起作用。

您可以将string存储在单元格数组中。 input“doc cell”以获取更多信息。

我build议查看数据集数组。

数据集数组是统计工具箱附带的数据types。它是专门devise用来将hetrogeneous数据存储在一个容器中。

统计工具箱演示页面包含一些显示某些数据集数组function的vidoes。第一个标题是“数据集arrays简介”。第二个标题是“join简介”。

http://www.mathworks.com/products/statistics/demos.html

如果您的input文件具有固定数量的以逗号分隔的列，并且您知道哪些列是string，则最好使用该函数

 textscan()

请注意，您可以指定一种格式，您可以在该格式中读取string中最多的字符数，或直到find分隔符（逗号）。

 % Assuming that the dataset is ";"-delimited and each line ends with ";" fid = fopen('sampledata.csv'); tline = fgetl(fid); u=sprintf('%c',tline); c=length(u); id=findstr(u,';'); n=length(id); data=cell(1,n); for I=1:n if I==1 data{1,I}=u(1:id(I)-1); else data{1,I}=u(id(I-1)+1:id(I)-1); end end ct=1; while ischar(tline) ct=ct+1; tline = fgetl(fid); u=sprintf('%c',tline); id=findstr(u,';'); if~isempty(id) for I=1:n if I==1 data{ct,I}=u(1:id(I)-1); else data{ct,I}=u(id(I-1)+1:id(I)-1); end end end end fclose(fid);

导入具有混合数据types的CSV文件

导入CSV以仅更新表中的一列

用putty导入并插入sql.gz文件到数据库中

一种在Xcode中自动组织#imports的方法

我如何使pyCharm停止隐藏（展开）我的python导入？

未使用的导入和对象是否具有性能影响

我怎样才能从terminal导入数据库与MySQL？

如何将图像导入或复制到Android Studio中的“res”文件夹？

如何从另一个模块更改模块variables？

导入包。* vs导入包.SpecificType

如何testing一个Python模块是否已被导入？