C# 从CSV创建2D数组并获取指定列的字数

C# 从CSV创建2D数组并获取指定列的字数,c#,string,multidimensional-array,tokenize,word-count,C#,String,Multidimensional Array,Tokenize,Word Count,我有一个CSV文件,如下所示: ,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault

我有一个CSV文件,如下所示:

,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault type C_Desc,Comment R,Baumuster Sales desc.,Baumuster Technical desc.,T24
0,09122,Engine,42,Poor fit,7117215,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0092,55SWF8DB7KU316971,,A,2019,,,,,,C 300,205 E20 G,
1,09122,Engine,42,Poor fit,7117235,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0122,55SWF8DB2KU316991,,A,2019,,,,,,C 300,205 E20 G,
2,09122,Transmission,42,Poor fit,7117237,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0126,55SWF8DB6KU316993,,A,2019,,,,,,C 300,205 E20 G,
var path = "C:\\data.csv";

var recordCounts = new Dictionary<string, Dictionary<string, long>>();

using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Read();
    csv.ReadHeader();
    var headerRow = csv.Context.HeaderRecord;

    if (string.IsNullOrEmpty(headerRow[0]))
    {
        headerRow[0] = "RowNumber";
    }

    foreach (var header in headerRow)
    {
        recordCounts.Add(header, new Dictionary<string, long>());
    }

    while (csv.Read())
    {
        foreach (var header in headerRow)
        {
            var headerKey = header == "RowNumber" ? string.Empty : header;
            var columnValue = csv.GetField(headerKey);
            if (!string.IsNullOrEmpty(columnValue))
            {
                var count = recordCounts[header].GetValueOrDefault(columnValue, 0) + 1;
                recordCounts[header][columnValue] = count;
            }

        }
    }
}
我想编写一段代码,在字典样式的键值对中标记选定列的单词之后,获取选定列标题的单词计数。 我还希望保持单词计数按值降序排列。 例如

位置描述

引擎:2

传输:1

这是我目前掌握的代码:

            int colNumber;
            for(colNumber=0; colNumber<columns.Length; colNumber++)
            {
                if ( columns[colNumber].Equals(columnHeader))
                {
                    break;
                }
            }

            Debug.WriteLine("Column Number: " + colNumber);
            for(int i=0; i<inputCsv.Length; i++)
            {
                string[] row = inputCsv[i].Split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
                string column = row[colNumber];
                Debug.WriteLine(row.ToString());
            }
我能够通过for循环获得列标题名称,但我不仅不能忽略引号中的逗号,而且无法从列标题(在Python的Pandas中也称为序列)中获取值


非常感谢您的帮助

我可能会将您的计数存储在字典中,而不是2D数组中。然后,您可以更轻松地访问每一列

使用NuGet包,我们可以创建一个类来建模CSV文件。这里唯一需要注意的是为列选择正确的数据类型。我选择的数据类型可能不适合您的情况。您还可以在API文档中找到

然后我们可以使用GetRecords将记录读入IEnumerable:

输出:

使现代化 如果您希望支持多个CSV文件,并且不存储用于columnsavoiding反射的类,则可以使用以下通用解决方案:

,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault type C_Desc,Comment R,Baumuster Sales desc.,Baumuster Technical desc.,T24
0,09122,Engine,42,Poor fit,7117215,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0092,55SWF8DB7KU316971,,A,2019,,,,,,C 300,205 E20 G,
1,09122,Engine,42,Poor fit,7117235,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0122,55SWF8DB2KU316991,,A,2019,,,,,,C 300,205 E20 G,
2,09122,Transmission,42,Poor fit,7117237,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0126,55SWF8DB6KU316993,,A,2019,,,,,,C 300,205 E20 G,
var path = "C:\\data.csv";

var recordCounts = new Dictionary<string, Dictionary<string, long>>();

using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Read();
    csv.ReadHeader();
    var headerRow = csv.Context.HeaderRecord;

    if (string.IsNullOrEmpty(headerRow[0]))
    {
        headerRow[0] = "RowNumber";
    }

    foreach (var header in headerRow)
    {
        recordCounts.Add(header, new Dictionary<string, long>());
    }

    while (csv.Read())
    {
        foreach (var header in headerRow)
        {
            var headerKey = header == "RowNumber" ? string.Empty : header;
            var columnValue = csv.GetField(headerKey);
            if (!string.IsNullOrEmpty(columnValue))
            {
                var count = recordCounts[header].GetValueOrDefault(columnValue, 0) + 1;
                recordCounts[header][columnValue] = count;
            }

        }
    }
}
它使用来自的标题读取方法,并使用CsvHelper文档提供的方法。这些资源和建议在评论中得到了一些有益的建议


然后,您可以将上述解决方案与上面的LINQ字典排序查询和打印代码结合起来,以生成类似的结果

我可能会将您的计数存储在字典中,而不是2D数组中。然后,您可以更轻松地访问每一列

使用NuGet包,我们可以创建一个类来建模CSV文件。这里唯一需要注意的是为列选择正确的数据类型。我选择的数据类型可能不适合您的情况。您还可以在API文档中找到

然后我们可以使用GetRecords将记录读入IEnumerable:

输出:

使现代化 如果您希望支持多个CSV文件,并且不存储用于columnsavoiding反射的类,则可以使用以下通用解决方案:

,Location_Code,Location_Desc,Type_Code,Fault_type,Prod_Number,Model,Causer,Auditor,Prio,Capture_Date,Steer,Engine,Country,Current shift number,VIN,Comment,Shift,Year,Fault location C_Code,Fault location C_Desc,Fault type C_Code,Fault type C_Desc,Comment R,Baumuster Sales desc.,Baumuster Technical desc.,T24
0,09122,Engine,42,Poor fit,7117215,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0092,55SWF8DB7KU316971,,A,2019,,,,,,C 300,205 E20 G,
1,09122,Engine,42,Poor fit,7117235,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0122,55SWF8DB2KU316991,,A,2019,,,,,,C 300,205 E20 G,
2,09122,Transmission,42,Poor fit,7117237,W205,Final 3,"Plant 1, WSA",0,2019-04-05,1,83,705,T1220190404T0126,55SWF8DB6KU316993,,A,2019,,,,,,C 300,205 E20 G,
var path = "C:\\data.csv";

var recordCounts = new Dictionary<string, Dictionary<string, long>>();

using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Read();
    csv.ReadHeader();
    var headerRow = csv.Context.HeaderRecord;

    if (string.IsNullOrEmpty(headerRow[0]))
    {
        headerRow[0] = "RowNumber";
    }

    foreach (var header in headerRow)
    {
        recordCounts.Add(header, new Dictionary<string, long>());
    }

    while (csv.Read())
    {
        foreach (var header in headerRow)
        {
            var headerKey = header == "RowNumber" ? string.Empty : header;
            var columnValue = csv.GetField(headerKey);
            if (!string.IsNullOrEmpty(columnValue))
            {
                var count = recordCounts[header].GetValueOrDefault(columnValue, 0) + 1;
                recordCounts[header][columnValue] = count;
            }

        }
    }
}
它使用来自的标题读取方法,并使用CsvHelper文档提供的方法。这些资源和建议在评论中得到了一些有益的建议


然后,您可以将上述解决方案与上面的LINQ字典排序查询和打印代码结合起来,以生成类似的结果

有很多库可以读取CSV文件。例如考虑使用它来读取CSV文件。这将为您节省大量时间。我想编写代码,在以字典样式的键值对标记选定列的单词后,获取选定列标题的单词计数。我有一个关于你这部分问题的问题。样本中列错误类型的预期结果是:1。一双不合身:3或2。两对差:3和适合:3?@IliarTurdushev是一对。有很多库可以读取CSV文件。例如考虑使用它来读取CSV文件。这将为您节省大量时间。我想编写代码,在以字典样式的键值对标记选定列的单词后,获取选定列标题的单词计数。我有一个关于你这部分问题的问题。样本中列错误类型的预期结果是:1。一双不合身:3或2。两对差:3对合适:3对?@IliarTurdushev这是一对
Column: RowNumber
Value: 0, Count: 1
Value: 1, Count: 1
Value: 2, Count: 1

Column: LocationCode
Value: 09122, Count: 3

Column: LocationDesc
Value: Engine, Count: 2
Value: Transmission, Count: 1

Column: TypeCode
Value: 42, Count: 3

Column: FaultType
Value: Poor fit, Count: 3

Column: ProdNumber
Value: 7117215, Count: 1
Value: 7117235, Count: 1
Value: 7117237, Count: 1

Column: Model
Value: W205, Count: 3

Column: Causer
Value: Final 3, Count: 3

Column: Auditor
Value: Plant 1, WSA, Count: 3

Column: Prio
Value: 0, Count: 3

Column: CaptureDate
Value: 5/04/2019 12:00:00 AM, Count: 3

Column: Steer
Value: 1, Count: 3

Column: Engine
Value: 83, Count: 3

Column: Country
Value: 705, Count: 3

Column: CurrentShiftNumber
Value: T1220190404T0092, Count: 1
Value: T1220190404T0122, Count: 1
Value: T1220190404T0126, Count: 1

Column: VIN
Value: 55SWF8DB7KU316971, Count: 1
Value: 55SWF8DB2KU316991, Count: 1
Value: 55SWF8DB6KU316993, Count: 1

Column: Comment
No values found

Column: Shift
Value: A, Count: 3

Column: Year
Value: 2019, Count: 3

Column: FaultLocationCCode
No values found

Column: FaultLocationCDesk
No values found

Column: FaultTypeCCode
No values found

Column: FaultTypeCDesc
No values found

Column: CommentR
No values found

Column: BaumusterSalesDesc
Value: C 300, Count: 3

Column: BaumusterTechnicalDesc
Value: 205 E20 G, Count: 3

Column: T24
No values found
var path = "C:\\data.csv";

var recordCounts = new Dictionary<string, Dictionary<string, long>>();

using (var reader = new StreamReader(path))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
    csv.Read();
    csv.ReadHeader();
    var headerRow = csv.Context.HeaderRecord;

    if (string.IsNullOrEmpty(headerRow[0]))
    {
        headerRow[0] = "RowNumber";
    }

    foreach (var header in headerRow)
    {
        recordCounts.Add(header, new Dictionary<string, long>());
    }

    while (csv.Read())
    {
        foreach (var header in headerRow)
        {
            var headerKey = header == "RowNumber" ? string.Empty : header;
            var columnValue = csv.GetField(headerKey);
            if (!string.IsNullOrEmpty(columnValue))
            {
                var count = recordCounts[header].GetValueOrDefault(columnValue, 0) + 1;
                recordCounts[header][columnValue] = count;
            }

        }
    }
}