Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/excel/26.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 使用Open XML从Excel读取完整表格…更快_C#_Excel_Openxml_Openxml Sdk - Fatal编程技术网

C# 使用Open XML从Excel读取完整表格…更快

C# 使用Open XML从Excel读取完整表格…更快,c#,excel,openxml,openxml-sdk,C#,Excel,Openxml,Openxml Sdk,警告:由于示例和结果导致帖子过长 这里有一些关于如何读取列之间有空单元格的开放XML电子表格行的线程。我从这里得出了一些答案 我能够从xlsx中读取一个表,但是它比从CSV读取慢10倍,而开放式XML结构应该会产生更好的结果 下面是我为测试代码库所得到的: foreach (Row r in sheetData.Descendants<Row>()) { sw.Start(); //find a row marked as "header" and get list of colu

警告:由于示例和结果导致帖子过长

这里有一些关于如何读取列之间有空单元格的开放XML电子表格行的线程。我从这里得出了一些答案

我能够从xlsx中读取一个表,但是它比从CSV读取慢10倍,而开放式XML结构应该会产生更好的结果

下面是我为测试代码库所得到的:

foreach (Row r in sheetData.Descendants<Row>())
{
sw.Start();

//find a row marked as "header" and get list of columns that define width of table
if (!headerRowFound)
{
    headerRowFound = CheckOXMLHeaderRow(r, workbookPart, out headerReferences);
    if (!headerRowFound)
        continue;
}

rowKey++;
//////////////////////////////////////////////////////////////////////
///////////////////here we are going to do work//////////////////////
////////////////////////////////////////////////////////////////////    

AddRow(rowKey, cols);
sw.Stop();
Debug.WriteLine("XLSX Row added in \t" + sw.ElapsedTicks.ToString() + "\tticks");
sw.Reset();
  }
这是MSDN的一个例子,它每行发出500000个滴答声。花了几分钟来分析5000行的电子表格。不可接受。 我们的目标是一行中的每一个细胞,无论是否存在

四,。我决定缩小范围,并尝试从所有无序的传入单元格中检索值到哈希表中

Hashtable cols = new Hashtable();
foreach (Cell c in r.Descendants<Cell>())
{
        colKey++;
        cols.Add(colKey, GetValueFromCell(c, workbookPart));
}
现在是每行500-1500个滴答声。尽管如此,如果我们只存储值而不按任何顺序存储,速度会非常快,这还不是一个解决方案

五,。为了确保保留列的顺序,我为每一新行创建了一个标题行的空克隆,并在解析现有单元格后,根据哈希表检索决定将它们放在何处

Hashtable cols = (Hashtable)emptyNewRow.Clone();                        
foreach (Cell c in r.Descendants<Cell>())
{
    colKey = headerReferences[GetColumnName(c.CellReference)]; //what # column is this?
    cols[colKey] = GetValueFromCell(c, workbookPart); //put value in that column
}
最终结果是每行9000-20000个滴答声。30秒用于5000个电子表格。可行,但不理想

这就是我停下来的地方。有没有办法加快速度?庞大的xlsx电子表格如何能以闪电般的速度加载,我在这里能做的最好的事情是5k行30秒

字典对我没有任何帮助,甚至没有1%的进步。我需要在哈希表的结果无论如何为旧式改造

附录:参考方法

public static string GetColumnName(string cellReference)
        {
            // Match the column name portion of the cell name.
            Regex regex = new Regex("[A-Za-z]+");
            Match match = regex.Match(cellReference);

            return match.Value;
        }

public static string GetValueFromCell(Cell cell, WorkbookPart workbookPart)
        {
            int id;
            string cellValue = cell.InnerText;

            if (cellValue.Trim().Length > 0)
            {
                if (cell.DataType != null)
                {
                    switch (cell.DataType.Value)
                    {
                        case CellValues.SharedString:

                            Int32.TryParse(cellValue, out id);
                            SharedStringItem item = GetSharedStringItemById(workbookPart, id);
                            if (item.Text != null)
                            {
                                cellValue = item.Text.Text;
                            }
                            else if (item.InnerText != null)
                            {
                                cellValue = item.InnerText;
                            }
                            else if (item.InnerXml != null)
                            {
                                cellValue = item.InnerXml;
                            }
                            break;

                        case CellValues.Boolean:
                            switch (cellValue)
                            {
                                case "0":
                                    cellValue = "FALSE";
                                    break;
                                default:
                                    cellValue = "TRUE";
                                    break;
                            }
                            break;
                    }
                }

                else
                {
                    int excelDate;
                    if (Int32.TryParse(cellValue, out excelDate))
                    {

                        var styleIndex = (int)cell.StyleIndex.Value;

                        var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                        var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {

                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>().SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("/yy")) //TODO here i should think of locales
                            {
                                DateTime dt = DateTime.FromOADate(excelDate);
                                cellValue = dt.ToString("MM/dd/yyyy");
                            }
                        }
                    }
                }
            }
            return cellValue;
        }

public static string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string addressName)
        {
            string value = String.Empty; //code from microsoft prefers null, but null is tough to work with

            // Use its Worksheet property to get a reference to the cell 
            // whose address matches the address you supplied.
            Cell theCell = wsPart.Worksheet.Descendants<Cell>().
              Where(c => c.CellReference == addressName).FirstOrDefault();

            // If the cell does not exist, return an empty string.
            if (theCell != null)
            {
                value = theCell.InnerText;

                // If the cell represents an integer number, you are done. 
                // For dates, this code returns the serialized value that 
                // represents the date. The code handles strings and 
                // Booleans individually. For shared strings, the code 
                // looks up the corresponding value in the shared string 
                // table. For Booleans, the code converts the value into 
                // the words TRUE or FALSE.
                if (theCell.DataType != null)
                {
                    switch (theCell.DataType.Value)
                    {
                        case CellValues.SharedString:

                            // For shared strings, look up the value in the shared strings table.
                            var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

                            // If the shared string table is missing, something is wrong. Return the index that is in the cell. 
                            //Otherwise, look up the correct text in the table.
                            if (stringTable != null)
                            {
                                value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
                            }
                            break;

                        case CellValues.Boolean:
                            switch (value)
                            {
                                case "0":
                                    value = "FALSE";
                                    break;
                                default:
                                    value = "TRUE";
                                    break;
                            }
                            break;
                    }
                }
            }
            return value;
        }

XML比CSV快?有这么多开销?嗯,这是一个假设,我不会从一开始。无论如何您是在调试模式还是发布模式下测量的?GetCellValue做什么?您只发布了GetValueFromCell。我同意认为裸CSV比多参数xml结构更容易阅读是有道理的,尽管我认为这些结构可能有一些优势。无论如何,你可能是对的。我添加了GetCellValue方法,它几乎取自msdn。它的检索方法与其他方法相同,但在我的例子中,行中的每个单元格持有者都被查询,因此效率低下。
Hashtable cols = (Hashtable)emptyNewRow.Clone();                        
foreach (Cell c in r.Descendants<Cell>())
{
    colKey = headerReferences[GetColumnName(c.CellReference)]; //what # column is this?
    cols[colKey] = GetValueFromCell(c, workbookPart); //put value in that column
}
public static string GetColumnName(string cellReference)
        {
            // Match the column name portion of the cell name.
            Regex regex = new Regex("[A-Za-z]+");
            Match match = regex.Match(cellReference);

            return match.Value;
        }

public static string GetValueFromCell(Cell cell, WorkbookPart workbookPart)
        {
            int id;
            string cellValue = cell.InnerText;

            if (cellValue.Trim().Length > 0)
            {
                if (cell.DataType != null)
                {
                    switch (cell.DataType.Value)
                    {
                        case CellValues.SharedString:

                            Int32.TryParse(cellValue, out id);
                            SharedStringItem item = GetSharedStringItemById(workbookPart, id);
                            if (item.Text != null)
                            {
                                cellValue = item.Text.Text;
                            }
                            else if (item.InnerText != null)
                            {
                                cellValue = item.InnerText;
                            }
                            else if (item.InnerXml != null)
                            {
                                cellValue = item.InnerXml;
                            }
                            break;

                        case CellValues.Boolean:
                            switch (cellValue)
                            {
                                case "0":
                                    cellValue = "FALSE";
                                    break;
                                default:
                                    cellValue = "TRUE";
                                    break;
                            }
                            break;
                    }
                }

                else
                {
                    int excelDate;
                    if (Int32.TryParse(cellValue, out excelDate))
                    {

                        var styleIndex = (int)cell.StyleIndex.Value;

                        var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;
                        var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;
                        var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);

                        if (cellFormat.NumberFormatId != null)
                        {

                            var numberFormatId = cellFormat.NumberFormatId.Value;
                            var numberingFormat = numberingFormats.Cast<NumberingFormat>().SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);

                            if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("/yy")) //TODO here i should think of locales
                            {
                                DateTime dt = DateTime.FromOADate(excelDate);
                                cellValue = dt.ToString("MM/dd/yyyy");
                            }
                        }
                    }
                }
            }
            return cellValue;
        }

public static string GetCellValue(WorkbookPart wbPart, WorksheetPart wsPart, string addressName)
        {
            string value = String.Empty; //code from microsoft prefers null, but null is tough to work with

            // Use its Worksheet property to get a reference to the cell 
            // whose address matches the address you supplied.
            Cell theCell = wsPart.Worksheet.Descendants<Cell>().
              Where(c => c.CellReference == addressName).FirstOrDefault();

            // If the cell does not exist, return an empty string.
            if (theCell != null)
            {
                value = theCell.InnerText;

                // If the cell represents an integer number, you are done. 
                // For dates, this code returns the serialized value that 
                // represents the date. The code handles strings and 
                // Booleans individually. For shared strings, the code 
                // looks up the corresponding value in the shared string 
                // table. For Booleans, the code converts the value into 
                // the words TRUE or FALSE.
                if (theCell.DataType != null)
                {
                    switch (theCell.DataType.Value)
                    {
                        case CellValues.SharedString:

                            // For shared strings, look up the value in the shared strings table.
                            var stringTable = wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault();

                            // If the shared string table is missing, something is wrong. Return the index that is in the cell. 
                            //Otherwise, look up the correct text in the table.
                            if (stringTable != null)
                            {
                                value = stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText;
                            }
                            break;

                        case CellValues.Boolean:
                            switch (value)
                            {
                                case "0":
                                    value = "FALSE";
                                    break;
                                default:
                                    value = "TRUE";
                                    break;
                            }
                            break;
                    }
                }
            }
            return value;
        }