如何使用PHPExcel从大型Excel文件(27MB+;)中读取大型工作表?
我有大型Excel工作表,我希望能够使用PHPExcel将其读入MySQL 我使用的是,它允许您在不打开整个文件的情况下读取工作表。这样我一次可以读一份工作表 但是,一个Excel文件的大小为27MB。我可以成功地读取第一个工作表,因为它很小,但是第二个工作表太大了,以至于22:00启动流程的cron作业没有在上午8:00完成,工作表太大了 是否有办法逐行阅读工作表,例如:如何使用PHPExcel从大型Excel文件(27MB+;)中读取大型工作表?,php,phpexcel,Php,Phpexcel,我有大型Excel工作表,我希望能够使用PHPExcel将其读入MySQL 我使用的是,它允许您在不打开整个文件的情况下读取工作表。这样我一次可以读一份工作表 但是,一个Excel文件的大小为27MB。我可以成功地读取第一个工作表,因为它很小,但是第二个工作表太大了,以至于22:00启动流程的cron作业没有在上午8:00完成,工作表太大了 是否有办法逐行阅读工作表,例如: $inputFileType = 'Excel2007'; $inputFileName = 'big_file.xlsx
$inputFileType = 'Excel2007';
$inputFileName = 'big_file.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);
foreach ($worksheetNames as $sheetName) {
//BELOW IS "WISH CODE":
foreach($row = 1; $row <=$max_rows; $row+= 100) {
$dataset = $objReader->getWorksheetWithRows($row, $row+100);
save_dataset_to_database($dataset);
}
}
增编2 当我发表评论时:
//$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, true);
//var_dump($sheetData);
然后它以可接受的速度进行分析(大约每秒2行),是否有任何方法可以提高toArray()的性能
增编3
例如,至少在3 MB的文件上,这似乎可以充分发挥作用:
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ', $startRow, ' to ', ($startRow + $chunkSize - 1), '<br />';
$chunkFilter->setRows($startRow, $chunkSize);
$objPHPExcel = $objReader->load('data/' . $file_name);
debug_log('reading chunk starting at row ' . $startRow);
foreach ($objPHPExcel->getActiveSheet()->getRowIterator() as $row) {
$cellIterator = $row->getCellIterator();
$cellIterator->setIterateOnlyExistingCells(false);
echo '<tr>';
foreach ($cellIterator as $cell) {
if (!is_null($cell)) {
//$value = $cell->getCalculatedValue();
$rawValue = $cell->getValue();
debug_log($rawValue);
}
}
}
}
for($startRow=2;$startRow setRows($startRow,$chunkSize);
$objPHPExcel=$objReader->load('data/'。$file\u name);
调试日志('reading chunk start in row'.$startRow');
foreach($objPHPExcel->getActiveSheet()->getRowIterator()作为$row){
$cellIterator=$row->getCellIterator();
$cellIterator->setiterateonleyexistingcells(false);
回声';
foreach($cellIterator作为$cell){
如果(!为空($cell)){
//$value=$cell->getCalculatedValue();
$rawValue=$cell->getValue();
调试日志($rawValue);
}
}
}
}
虽然我不能保证效率,但使用读取过滤器可以“分块”读取工作表
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
echo '<hr />';
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 20;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
/** $startRow is set to 2 initially because we always read the headings in row #1 **/
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
// Do some processing here
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
var_dump($sheetData);
echo '<br /><br />';
}
$inputFileType='Excel5';
$inputFileName='./sampleData/example2.xls';
/**定义一个实现PHPExcel\u Reader\u IReadFilter的读取筛选器类*/
类chunkReadFilter实现PHPExcel\u Reader\u IReadFilter
{
私有$_startRow=0;
私有$_endRow=0;
/**设置要读取的行列表*/
公共函数setRows($startRow,$chunkSize){
$this->_startRow=$startRow;
$this->_endRow=$startRow+$chunkSize;
}
公共函数readCell($column,$row,$worksheetName=''){
//仅读取标题行以及在$this->\u startRow和$this->\u endRow中配置的行
如果(($row==1)| |($row>=$this->_startRow&&$row<$this->_endRow)){
返回true;
}
返回false;
}
}
回显“加载文件”、“路径信息($inputFileName,pathinfo_BASENAME)”、“使用IOFactory并定义读取器类型'、$inputFileType、“
”;
/**创建$inputFileType中定义的类型的新读取器**/
$objReader=PHPExcel\u IOFactory::createReader($inputFileType);
回声“
”;
/**定义每个“块”要读取的行数**/
$chunkSize=20;
/**创建读取筛选器的新实例**/
$chunkFilter=新的chunkReadFilter();
/**告诉读者我们想要使用我们已经实例化的读取过滤器**/
$objReader->setReadFilter($chunkFilter);
/**循环阅读“块大小”块中的工作表**/
/**$startRow最初设置为2,因为我们总是阅读第1行中的标题**/
对于($startRow=2;$startRow setRows($startRow,$chunkSize);
/**仅将与筛选器匹配的行从$inputFileName加载到PHPExcel对象**/
$objPHPExcel=$objReader->load($inputFileName);
//在这里做一些处理
$sheetData=$objPHPExcel->getActiveSheet()->toArray(null、true、true、true);
var_dump($sheetData);
回音“
”;
}
请注意,此读取筛选器将始终读取工作表的第一行以及区块规则定义的行
使用读取筛选器时,PHPExcel仍会解析整个文件,但只加载与定义的读取筛选器匹配的单元格,因此它只使用该数量的单元格所需的内存。但是,它会多次解析文件,每个区块一次,因此速度会慢一些。此示例一次读取20行:逐行读取,简单地说将$chunkSize设置为1
如果您有引用不同“块”中的单元格的公式,这也会导致问题,因为当前“块”之外的单元格无法获得数据。当前要读取.xlsx
、.csv
和.ods
最好的选择是电子表格读取器()因为它可以在不将所有文件加载到内存的情况下读取文件。对于.xls
扩展,它有一些限制,因为它使用PHPExcel进行读取。这是ChunkReadFilter.php:
<?php
Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter {
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
?>
这是index.php,在本文件末尾是一个不完美但基本的实现
<?php
require_once './Classes/PHPExcel/IOFactory.php';
require_once 'ChunkReadFilter.php';
class Excelreader {
/**
* This function is used to read data from excel file in chunks and insert into database
* @param string $filePath
* @param integer $chunkSize
*/
public function readFileAndDumpInDB($filePath, $chunkSize) {
echo("Loading file " . $filePath . " ....." . PHP_EOL);
/** Create a new Reader of the type that has been identified * */
$objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath));
$spreadsheetInfo = $objReader->listWorksheetInfo($filePath);
/** Create a new Instance of our Read Filter * */
$chunkFilter = new ChunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated * */
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
//$objReader->setLoadSheetsOnly("Sheet1");
//get header column name
$chunkFilter->setRows(0, 1);
echo("Reading file " . $filePath . PHP_EOL . "<br>");
$totalRows = $spreadsheetInfo[0]['totalRows'];
echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>");
/** Loop to read our worksheet in "chunk size" blocks * */
/** $startRow is set to 1 initially because we always read the headings in row #1 * */
for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) {
echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>");
$i = 0;
/** Tell the Read Filter, the limits on which rows we want to read this iteration * */
$chunkFilter->setRows($startRow, $chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object * */
$objPHPExcel = $objReader->load($filePath);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false);
$startIndex = ($startRow == 1) ? $startRow : $startRow - 1;
//dumping in database
if (!empty($sheetData) && $startRow < $totalRows) {
/**
* $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize));
*/
echo "<table border='1'>";
foreach ($sheetData as $key => $value) {
$i++;
if ($value[0] != null) {
echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>";
}
}
echo "</table><br/><br/>";
}
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel, $sheetData);
}
echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>");
}
/**
* Insert data into database table
* @param Array $sheetData
* @return boolean
* @throws Exception
* THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD..
*/
protected function dumpInDb($sheetData) {
$con = DbAdapter::getDBConnection();
$query = "INSERT INTO employe(name,address)VALUES";
for ($i = 1; $i < count($sheetData); $i++) {
$query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "',"
. "'" . mysql_escape_string($sheetData[$i][1]) . "')";
}
$query = trim($query, ",");
$query .="ON DUPLICATE KEY UPDATE name=VALUES(name),
=VALUES(address),
";
if (mysqli_query($con, $query)) {
mysql_close($con);
return true;
} else {
mysql_close($con);
throw new Exception(mysqli_error($con));
}
}
/**
* This function returns list of files corresponding to given directory path
* @param String $dataFolderPath
* @return Array list of file
*/
protected function getFileList($dataFolderPath) {
if (!is_dir($dataFolderPath)) {
throw new Exception("Directory " . $dataFolderPath . " is not exist");
}
$root = scandir($dataFolderPath);
$fileList = array();
foreach ($root as $value) {
if ($value === '.' || $value === '..') {
continue;
}
if (is_file("$dataFolderPath/$value")) {
$fileList[] = "$dataFolderPath/$value";
continue;
}
}
return $fileList;
}
}
$inputFileName = './prueba_para_batch.xls';
$excelReader = new Excelreader();
$excelReader->readFileAndDumpInDB($inputFileName, 500);
我正在测试文件中的代码,但它告诉我类'ChunkReadFilter'未找到
。如果我取下实现PHPExcel\u Reader\u IReadFilter
,它会找到该类并告诉我需要必须实现接口PHPExcel\u Reader\u IReadFilter
,我会在文件的开头放上require\u once'PHPExcelClasses/PHPExcel/Reader/IReadFilter.php'
和需要_once'PHPExcelClasses/PHPExcel/Reader/IReader.php'
但是如果我实现这个接口,它仍然找不到这个类,我还需要包含一些其他文件吗?只需要将ChunkReadFilter
类放在主代码之上,现在就可以了,谢谢我已经减少了代码nk size设置为1行,但即使是在27MB Excel文件中最大的工作表上,50秒后我的内存不足(分配为1538523136)
,我将内存限制设置为几乎最大值:ini\u set(“内存限制”,“3700M”);
。我正在使用上面的最后一个代码示例(附录3)所以我知道这不是计算单元格,而是给我原始值。有没有其他方法可以防止使用这么多的备忘
$inputFileType = 'Excel5';
$inputFileName = './sampleData/example2.xls';
/** Define a Read Filter class implementing PHPExcel_Reader_IReadFilter */
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
echo 'Loading file ',pathinfo($inputFileName,PATHINFO_BASENAME),' using IOFactory with a defined reader type of ',$inputFileType,'<br />';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
echo '<hr />';
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 20;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
/** $startRow is set to 2 initially because we always read the headings in row #1 **/
for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {
echo 'Loading WorkSheet using configurable filter for headings row 1 and for rows ',$startRow,' to ',($startRow+$chunkSize-1),'<br />';
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($inputFileName);
// Do some processing here
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
var_dump($sheetData);
echo '<br /><br />';
}
<?php
Class ChunkReadFilter implements PHPExcel_Reader_IReadFilter {
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
?>
<?php
require_once './Classes/PHPExcel/IOFactory.php';
require_once 'ChunkReadFilter.php';
class Excelreader {
/**
* This function is used to read data from excel file in chunks and insert into database
* @param string $filePath
* @param integer $chunkSize
*/
public function readFileAndDumpInDB($filePath, $chunkSize) {
echo("Loading file " . $filePath . " ....." . PHP_EOL);
/** Create a new Reader of the type that has been identified * */
$objReader = PHPExcel_IOFactory::createReader(PHPExcel_IOFactory::identify($filePath));
$spreadsheetInfo = $objReader->listWorksheetInfo($filePath);
/** Create a new Instance of our Read Filter * */
$chunkFilter = new ChunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated * */
$objReader->setReadFilter($chunkFilter);
$objReader->setReadDataOnly(true);
//$objReader->setLoadSheetsOnly("Sheet1");
//get header column name
$chunkFilter->setRows(0, 1);
echo("Reading file " . $filePath . PHP_EOL . "<br>");
$totalRows = $spreadsheetInfo[0]['totalRows'];
echo("Total rows in file " . $totalRows . " " . PHP_EOL . "<br>");
/** Loop to read our worksheet in "chunk size" blocks * */
/** $startRow is set to 1 initially because we always read the headings in row #1 * */
for ($startRow = 1; $startRow <= $totalRows; $startRow += $chunkSize) {
echo("Loading WorkSheet for rows " . $startRow . " to " . ($startRow + $chunkSize - 1) . PHP_EOL . "<br>");
$i = 0;
/** Tell the Read Filter, the limits on which rows we want to read this iteration * */
$chunkFilter->setRows($startRow, $chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object * */
$objPHPExcel = $objReader->load($filePath);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null, true, true, false);
$startIndex = ($startRow == 1) ? $startRow : $startRow - 1;
//dumping in database
if (!empty($sheetData) && $startRow < $totalRows) {
/**
* $this->dumpInDb(array_slice($sheetData, $startIndex, $chunkSize));
*/
echo "<table border='1'>";
foreach ($sheetData as $key => $value) {
$i++;
if ($value[0] != null) {
echo "<tr><td>id:$i</td><td>{$value[0]} </td><td>{$value[1]} </td><td>{$value[2]} </td><td>{$value[3]} </td></tr>";
}
}
echo "</table><br/><br/>";
}
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel, $sheetData);
}
echo("File " . $filePath . " has been uploaded successfully in database" . PHP_EOL . "<br>");
}
/**
* Insert data into database table
* @param Array $sheetData
* @return boolean
* @throws Exception
* THE METHOD FOR THE DATABASE IS NOT WORKING, JUST THE PUBLIC METHOD..
*/
protected function dumpInDb($sheetData) {
$con = DbAdapter::getDBConnection();
$query = "INSERT INTO employe(name,address)VALUES";
for ($i = 1; $i < count($sheetData); $i++) {
$query .= "(" . "'" . mysql_escape_string($sheetData[$i][0]) . "',"
. "'" . mysql_escape_string($sheetData[$i][1]) . "')";
}
$query = trim($query, ",");
$query .="ON DUPLICATE KEY UPDATE name=VALUES(name),
=VALUES(address),
";
if (mysqli_query($con, $query)) {
mysql_close($con);
return true;
} else {
mysql_close($con);
throw new Exception(mysqli_error($con));
}
}
/**
* This function returns list of files corresponding to given directory path
* @param String $dataFolderPath
* @return Array list of file
*/
protected function getFileList($dataFolderPath) {
if (!is_dir($dataFolderPath)) {
throw new Exception("Directory " . $dataFolderPath . " is not exist");
}
$root = scandir($dataFolderPath);
$fileList = array();
foreach ($root as $value) {
if ($value === '.' || $value === '..') {
continue;
}
if (is_file("$dataFolderPath/$value")) {
$fileList[] = "$dataFolderPath/$value";
continue;
}
}
return $fileList;
}
}
$inputFileName = './prueba_para_batch.xls';
$excelReader = new Excelreader();
$excelReader->readFileAndDumpInDB($inputFileName, 500);