C# 使用Open XML从Excel读取完整表格…更快
警告:由于示例和结果导致帖子过长 这里有一些关于如何读取列之间有空单元格的开放XML电子表格行的线程。我从这里得出了一些答案 我能够从xlsx中读取一个表,但是它比从CSV读取慢10倍,而开放式XML结构应该会产生更好的结果 下面是我为测试代码库所得到的:C# 使用Open XML从Excel读取完整表格…更快,c#,excel,openxml,openxml-sdk,C#,Excel,Openxml,Openxml Sdk,警告:由于示例和结果导致帖子过长 这里有一些关于如何读取列之间有空单元格的开放XML电子表格行的线程。我从这里得出了一些答案 我能够从xlsx中读取一个表,但是它比从CSV读取慢10倍,而开放式XML结构应该会产生更好的结果 下面是我为测试代码库所得到的: foreach (Row r in sheetData.Descendants<Row>()) { sw.Start(); //find a row marked as "header" and get list of colu
foreach (Row r in sheetData.Descendants<Row>())
{
sw.Start();
//find a row marked as "header" and get list of columns that define width of table
if (!headerRowFound)
{
headerRowFound = CheckOXMLHeaderRow(r, workbookPart, out headerReferences);
if (!headerRowFound)
continue;
}
rowKey++;
//////////////////////////////////////////////////////////////////////
///////////////////here we are going to do work//////////////////////
////////////////////////////////////////////////////////////////////
AddRow(rowKey, cols);
sw.Stop();
Debug.WriteLine("XLSX Row added in \t" + sw.ElapsedTicks.ToString() + "\tticks");
sw.Reset();
}
这是MSDN的一个例子,它每行发出500000个滴答声。花了几分钟来分析5000行的电子表格。不可接受。
我们的目标是一行中的每一个细胞,无论是否存在
四,。我决定缩小范围,并尝试从所有无序的传入单元格中检索值到哈希表中
Hashtable cols = new Hashtable();
foreach (Cell c in r.Descendants<Cell>())
{
colKey++;
cols.Add(colKey, GetValueFromCell(c, workbookPart));
}
现在是每行500-1500个滴答声。尽管如此,如果我们只存储值而不按任何顺序存储,速度会非常快,这还不是一个解决方案
五,。为了确保保留列的顺序,我为每一新行创建了一个标题行的空克隆,并在解析现有单元格后,根据哈希表检索决定将它们放在何处
Hashtable cols = (Hashtable)emptyNewRow.Clone();
foreach (Cell c in r.Descendants<Cell>())
{
colKey = headerReferences[GetColumnName(c.CellReference)]; //what # column is this?
cols[colKey] = GetValueFromCell(c, workbookPart); //put value in that column
}
最终结果是每行9000-20000个滴答声。30秒用于5000个电子表格。可行,但不理想
这就是我停下来的地方。有没有办法加快速度?庞大的xlsx电子表格如何能以闪电般的速度加载,我在这里能做的最好的事情是5k行30秒
字典对我没有任何帮助,甚至没有1%的进步。我需要在哈希表的结果无论如何为旧式改造
附录:参考方法
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
public static string GetValueFromCell(Cell cell, WorkbookPart workbookPart)
{
int id;
string cellValue = cell.InnerText;
if (cellValue.Trim().Length > 0)
{
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
Int32.TryParse(cellValue, out id);
SharedStringItem item = GetSharedStringItemById(workbookPart, id);
if (item.Text != null)
{
cellValue = item.Text.Text;
}
else if (item.InnerText != null)
{
cellValue = item.InnerText;
}
else if (item.InnerXml != null)
{
cellValue = item.InnerXml;
}
break;
case CellValues.Boolean:
switch (cellValue)
{
case "0":
cellValue = "FALSE";
break;
default:
cellValue = "TRUE";
break;
}
break;
}
}
else
{
int excelDate;
if (Int32.TryParse(cellValue, out excelDate))
{
var styleIndex = (int)cell.StyleIndex.Value;
var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);
if (cellFormat.NumberFormatId != null)
{
var numberFormatId = cellFormat.NumberFormatId.Value;
var numberingFormat = numberingFormats.Cast<NumberingFormat>().SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);
if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("/yy")) //TODO here i should think of locales
{
DateTime dt = DateTime.FromOADate(excelDate);
cellValue = dt.ToString("MM/dd/yyyy");
}
}
}
}
}
return cellValue;
}
public static string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string addressName)
{
string value = String.Empty; //code from microsoft prefers null, but null is tough to work with
// Use its Worksheet property to get a reference to the cell
// whose address matches the address you supplied.
Cell theCell = wsPart.Worksheet.Descendants<Cell>().
Where(c => c.CellReference == addressName).FirstOrDefault();
// If the cell does not exist, return an empty string.
if (theCell != null)
{
value = theCell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (theCell.DataType != null)
{
switch (theCell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared strings table.
var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// If the shared string table is missing, something is wrong. Return the index that is in the cell.
//Otherwise, look up the correct text in the table.
if (stringTable != null)
{
value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
}
return value;
}
XML比CSV快?有这么多开销?嗯,这是一个假设,我不会从一开始。无论如何您是在调试模式还是发布模式下测量的?GetCellValue做什么?您只发布了GetValueFromCell。我同意认为裸CSV比多参数xml结构更容易阅读是有道理的,尽管我认为这些结构可能有一些优势。无论如何,你可能是对的。我添加了GetCellValue方法,它几乎取自msdn。它的检索方法与其他方法相同,但在我的例子中,行中的每个单元格持有者都被查询,因此效率低下。
Hashtable cols = (Hashtable)emptyNewRow.Clone();
foreach (Cell c in r.Descendants<Cell>())
{
colKey = headerReferences[GetColumnName(c.CellReference)]; //what # column is this?
cols[colKey] = GetValueFromCell(c, workbookPart); //put value in that column
}
public static string GetColumnName(string cellReference)
{
// Match the column name portion of the cell name.
Regex regex = new Regex("[A-Za-z]+");
Match match = regex.Match(cellReference);
return match.Value;
}
public static string GetValueFromCell(Cell cell, WorkbookPart workbookPart)
{
int id;
string cellValue = cell.InnerText;
if (cellValue.Trim().Length > 0)
{
if (cell.DataType != null)
{
switch (cell.DataType.Value)
{
case CellValues.SharedString:
Int32.TryParse(cellValue, out id);
SharedStringItem item = GetSharedStringItemById(workbookPart, id);
if (item.Text != null)
{
cellValue = item.Text.Text;
}
else if (item.InnerText != null)
{
cellValue = item.InnerText;
}
else if (item.InnerXml != null)
{
cellValue = item.InnerXml;
}
break;
case CellValues.Boolean:
switch (cellValue)
{
case "0":
cellValue = "FALSE";
break;
default:
cellValue = "TRUE";
break;
}
break;
}
}
else
{
int excelDate;
if (Int32.TryParse(cellValue, out excelDate))
{
var styleIndex = (int)cell.StyleIndex.Value;
var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);
if (cellFormat.NumberFormatId != null)
{
var numberFormatId = cellFormat.NumberFormatId.Value;
var numberingFormat = numberingFormats.Cast<NumberingFormat>().SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);
if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("/yy")) //TODO here i should think of locales
{
DateTime dt = DateTime.FromOADate(excelDate);
cellValue = dt.ToString("MM/dd/yyyy");
}
}
}
}
}
return cellValue;
}
public static string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string addressName)
{
string value = String.Empty; //code from microsoft prefers null, but null is tough to work with
// Use its Worksheet property to get a reference to the cell
// whose address matches the address you supplied.
Cell theCell = wsPart.Worksheet.Descendants<Cell>().
Where(c => c.CellReference == addressName).FirstOrDefault();
// If the cell does not exist, return an empty string.
if (theCell != null)
{
value = theCell.InnerText;
// If the cell represents an integer number, you are done.
// For dates, this code returns the serialized value that
// represents the date. The code handles strings and
// Booleans individually. For shared strings, the code
// looks up the corresponding value in the shared string
// table. For Booleans, the code converts the value into
// the words TRUE or FALSE.
if (theCell.DataType != null)
{
switch (theCell.DataType.Value)
{
case CellValues.SharedString:
// For shared strings, look up the value in the shared strings table.
var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();
// If the shared string table is missing, something is wrong. Return the index that is in the cell.
//Otherwise, look up the correct text in the table.
if (stringTable != null)
{
value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
}
break;
case CellValues.Boolean:
switch (value)
{
case "0":
value = "FALSE";
break;
default:
value = "TRUE";
break;
}
break;
}
}
}
return value;
}