如何使用PHPExcel从大型Excel文件(27MB +)中读取大型工作表?
我有大量的Excel工作表,我希望能够使用PHPExcel读入MySQL。
我正在使用最近的补丁程序 ,它允许您在不打开整个文件的情况下阅读工作表。 这样,我可以一次阅读一个工作表。
但是,一个Excel文件大小是27MB。 我可以在第一张工作表中成功读取,因为它很小,但是第二张工作表太大,以至于在22:00开始处理的cron作业没有在上午8:00完成, 工作表太简单了 。
有什么方法可以逐行阅读工作表 ,例如:
$inputFileType = 'Excel2007'; $inputFileName = 'big_file.xlsx'; $objReader = PHPExcel_IOFactory::createReader($inputFileType); $worksheetNames = $objReader->listWorksheetNames($inputFileName); foreach ($worksheetNames as $sheetName) { //BELOW IS "WISH CODE": foreach($row = 1; $row <=$max_rows; $row+= 100) { $dataset = $objReader->getWorksheetWithRows($row, $row+100); save_dataset_to_database($dataset); } }
附录
@mark,我用你发布的代码来创build下面的例子:
function readRowsFromWorksheet() { $file_name = htmlentities($_POST['file_name']); $file_type = htmlentities($_POST['file_type']); echo 'Read rows from worksheet:<br />'; debug_log('----------start'); $objReader = PHPExcel_IOFactory::createReader($file_type); $chunkSize = 20; $chunkFilter = new ChunkReadFilter(); $objReader->setReadFilter($chunkFilter); for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { $chunkFilter->setRows($startRow, $chunkSize); $objPHPExcel = $objReader->load('data/' . $file_name); debug_log('reading chunk starting at row '.$startRow); $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true); var_dump($sheetData); echo '<hr />'; } debug_log('end'); }
正如下面的日志文件显示的,它运行良好的小型8K Excel文件,但是当我在3 MB的 Excel文件上运行,它永远不会超过第一个块, 有什么办法,我可以优化这个代码的性能,否则它看起来不够高性能不足以从一个大的Excel文件中获取块 :
2011-01-12 11:07:15: ----------start 2011-01-12 11:07:15: reading chunk starting at row 2 2011-01-12 11:07:15: reading chunk starting at row 22 2011-01-12 11:07:15: reading chunk starting at row 42 2011-01-12 11:07:15: reading chunk starting at row 62 2011-01-12 11:07:15: reading chunk starting at row 82 2011-01-12 11:07:15: reading chunk starting at row 102 2011-01-12 11:07:15: reading chunk starting at row 122 2011-01-12 11:07:15: reading chunk starting at row 142 2011-01-12 11:07:15: reading chunk starting at row 162 2011-01-12 11:07:15: reading chunk starting at row 182 2011-01-12 11:07:15: reading chunk starting at row 202 2011-01-12 11:07:15: reading chunk starting at row 222 2011-01-12 11:07:15: end 2011-01-12 11:07:52: ----------start 2011-01-12 11:08:01: reading chunk starting at row 2 (...at 11:18, CPU usage at 93% still running...)
附录2
当我评论:
//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true); //var_dump($sheetData);
然后它以可接受的速度parsing( 每秒大约2行 ),是否有增加toArray()
的性能?
2011-01-12 11:40:51: ----------start 2011-01-12 11:40:59: reading chunk starting at row 2 2011-01-12 11:41:07: reading chunk starting at row 22 2011-01-12 11:41:14: reading chunk starting at row 42 2011-01-12 11:41:22: reading chunk starting at row 62 2011-01-12 11:41:29: reading chunk starting at row 82 2011-01-12 11:41:37: reading chunk starting at row 102 2011-01-12 11:41:45: reading chunk starting at row 122 2011-01-12 11:41:52: reading chunk starting at row 142 2011-01-12 11:42:00: reading chunk starting at row 162 2011-01-12 11:42:07: reading chunk starting at row 182 2011-01-12 11:42:15: reading chunk starting at row 202 2011-01-12 11:42:22: reading chunk starting at row 222 2011-01-12 11:42:22: end
附录3
这似乎工作得很好,例如,至less在3 MB文件上:
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />'; $chunkFilter->setRows($startRow, $chunkSize); $objPHPExcel = $objReader->load('data/' . $file_name); debug_log('reading chunk starting at row ' . $startRow); foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) { $cellIterator = $row->getCellIterator(); $cellIterator->setIterateOnlyExistingCells(false); echo '<tr>'; foreach ($cellIterator as $cell) { if (!is_null($cell)) { //$value = $cell->getCalculatedValue(); $rawValue = $cell->getValue(); debug_log($rawValue); } } } }
可以使用“读取filter”以“块”的forms读取工作表,尽pipe我无法保证效率。
$inputFileType = 'Excel5'; $inputFileName = './sampleData/example2.xls'; /** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */ class chunkReadFilter implements PHPExcel_Reader_IReadFilter { private $_startRow = 0; private $_endRow = 0; /** Set the list of rows that we want to read */ public function setRows($startRow, $chunkSize) { $this->_startRow = $startRow; $this->_endRow = $startRow + $chunkSize; } public function readCell($column, $row, $worksheetName = '') { // Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { return true; } return false; } } echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />'; /** Create a new Reader of the type defined in $inputFileType **/ $objReader = PHPExcel_IOFactory::createReader($inputFileType); echo '<hr />'; /** Define how many rows we want to read for each "chunk" **/ $chunkSize = 20; /** Create a new Instance of our Read Filter **/ $chunkFilter = new chunkReadFilter(); /** Tell the Reader that we want to use the Read Filter that we've Instantiated **/ $objReader->setReadFilter($chunkFilter); /** Loop to read our worksheet in "chunk size" blocks **/ /** $startRow is set to 2 initially because we always read the headings in row #1 **/ for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) { echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />'; /** Tell the Read Filter, the limits on which rows we want to read this iteration **/ $chunkFilter->setRows($startRow,$chunkSize); /** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/ $objPHPExcel = $objReader->load($inputFileName); // Do some processing here $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true); var_dump($sheetData); echo '<br /><br />'; }
请注意,此读取filter将始终读取工作表的第一行以及块规则定义的行。
当使用读filter时,PHPExcel仍然会parsing整个文件,但只加载那些与定义的读filter匹配的单元,所以它只使用这个单元所需的内存。 但是,它会多次parsing该文件,每个块会一次,所以会更慢。 此示例一次读取20行:要逐行读取,只需将$ chunkSize设置为1即可。
这也可能会导致问题,如果你有不同的“块”参考单元格的公式,因为数据只是当前“块”以外的单元格不可用。
目前要读取.xlsx
, .csv
和.ods
,最好的select是电子表格阅读器( https://github.com/nuovo/spreadsheet-reader ),因为它可以读取文件而不会将其全部加载到内存中。 对于.xls
扩展名,它有限制,因为它使用PHPExcel进行阅读。
/ * *这是ChunkReadFilter.php * /
<?php Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter { private $_startRow = 0; private $_endRow = 0; /** Set the list of rows that we want to read */ public function setRows($startRow, $chunkSize) { $this->_startRow = $startRow; $this->_endRow = $startRow + $chunkSize; } public function readCell($column, $row, $worksheetName = '') { // Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { return true; } return false; } } ?>
/ * *这是index.php文件的末尾*不完美但基本的实现。 * /
<?php require_once './Classes/PHPExcel/IOFactory.php'; require_once 'ChunkReadFilter.php'; class Excelreader { /** * This function is used to read data from excel file in chunks and insert into database * @param string $filePath * @param integer $chunkSize */ public function readFileAndDumpInDB($filePath, $chunkSize) { echo("Loading file " . $filePath . " ....." . PHP_EOL); /** Create a new Reader of the type that has been identified * */ $objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath)); $spreadsheetInfo = $objReader->listWorksheetInfo($filePath); /** Create a new Instance of our Read Filter * */ $chunkFilter = new ChunkReadFilter(); /** Tell the Reader that we want to use the Read Filter that we've Instantiated * */ $objReader->setReadFilter($chunkFilter); $objReader->setReadDataOnly(true); //$objReader->setLoadSheetsOnly("Sheet1"); //get header column name $chunkFilter->setRows(0, 1); echo("Reading file " . $filePath . PHP_EOL . "<br>"); $totalRows = $spreadsheetInfo[0]['totalRows']; echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>"); /** Loop to read our worksheet in "chunk size" blocks * */ /** $startRow is set to 1 initially because we always read the headings in row #1 * */ for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) { echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>"); $i = 0; /** Tell the Read Filter, the limits on which rows we want to read this iteration * */ $chunkFilter->setRows($startRow, $chunkSize); /** Load only the rows that match our filter from $inputFileName to a PHPExcel Object * */ $objPHPExcel = $objReader->load($filePath); $sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false); $startIndex = ($startRow == 1) ? $startRow : $startRow - 1; //dumping in database if (!empty($sheetData) && $startRow < $totalRows) { /** * $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize)); */ echo "<table border='1'>"; foreach ($sheetData as $key => $value) { $i++; if ($value[0] != null) { echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>"; } } echo "</table><br/><br/>"; } $objPHPExcel->disconnectWorksheets(); unset($objPHPExcel, $sheetData); } echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>"); } /** * Insert data into database table * @param Array $sheetData * @return boolean * @throws Exception * THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD.. */ protected function dumpInDb($sheetData) { $con = DbAdapter::getDBConnection(); $query = "INSERT INTO employe(name,address)VALUES"; for ($i = 1; $i < count($sheetData); $i++) { $query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "'," . "'" . mysql_escape_string($sheetData[$i][1]) . "')"; } $query = trim($query, ","); $query .="ON DUPLICATE KEY UPDATE name=VALUES(name), =VALUES(address), "; if (mysqli_query($con, $query)) { mysql_close($con); return true; } else { mysql_close($con); throw new Exception(mysqli_error($con)); } } /** * This function returns list of files corresponding to given directory path * @param String $dataFolderPath * @return Array list of file */ protected function getFileList($dataFolderPath) { if (!is_dir($dataFolderPath)) { throw new Exception("Directory " . $dataFolderPath . " is not exist"); } $root = scandir($dataFolderPath); $fileList = array(); foreach ($root as $value) { if ($value === '.' || $value === '..') { continue; } if (is_file("$dataFolderPath/$value")) { $fileList[] = "$dataFolderPath/$value"; continue; } } return $fileList; } } $inputFileName = './prueba_para_batch.xls'; $excelReader = new Excelreader(); $excelReader->readFileAndDumpInDB($inputFileName, 500);