C# 如何拆分其列可能包含的csv,

C# 如何拆分其列可能包含的csv,,c#,.net,csv,C#,.net,Csv,给定 21016,7/31/2008 14:22,杰夫·达尔加斯,6/5/2011 22:21,http://stackoverflow.com,“Corvallis,或”,7679351,81,b437f461b3fd27387c5d8ab47a293d35,34 如何使用C#将上述信息拆分为以下字符串: 2 1016 7/31/2008 14:22 Geoff Dalgas 6/5/2011 22:21 http://stackoverflow.com Corvallis, OR 7679

给定

21016,7/31/2008 14:22,杰夫·达尔加斯,6/5/2011 22:21,http://stackoverflow.com,“Corvallis,或”,7679351,81,b437f461b3fd27387c5d8ab47a293d35,34

如何使用C#将上述信息拆分为以下字符串:

2
1016
7/31/2008 14:22
Geoff Dalgas
6/5/2011 22:21
http://stackoverflow.com
Corvallis, OR
7679
351
81
b437f461b3fd27387c5d8ab47a293d35
34
using(StreamReader SR = new StreamReader(FileName))
{
  while (SR.Peek() >-1)
    myStringArray = parseCSVLine(SR.ReadLine());
}

正如您可以看到其中一列所包含的,使用类似于CSV的库进行读取。它将处理带有引号的字段,并且由于已经存在很长时间,因此总体上可能比您的自定义解决方案更健壮。

使用类似于CSV读取的库。它将处理带有引号的字段,并且由于使用时间较长,总体上可能比您的自定义解决方案更健壮。

您可以在所有引号后面有偶数个引号的逗号上拆分

您还希望在for CSV格式上查看有关处理逗号的信息


有用的链接:

您可以在所有逗号后面有偶数引号的逗号上拆分

您还希望在for CSV格式上查看有关处理逗号的信息


有用的链接:

我看到,如果您在Excel中粘贴csv分隔文本并执行“文本到列”,它会要求您提供“文本限定符”。它默认为双引号,因此它将双引号内的文本视为文本。我想象Excel通过一次执行一个字符来实现这一点,如果遇到“文本限定符”,它将继续执行下一个“限定符”。您可能可以使用for循环和布尔值来实现这一点,以表示您是否在文本中

public string[] CsvParser(string csvText)
{
    List<string> tokens = new List<string>();

    int last = -1;
    int current = 0;
    bool inText = false;

    while(current < csvText.Length)
    {
        switch(csvText[current])
        {
            case '"':
                inText = !inText; break;
            case ',':
                if (!inText) 
                {
                    tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ',')); 
                    last = current;
                }
                break;
            default:
                break;
        }
        current++;
    }

    if (last != csvText.Length - 1) 
    {
        tokens.Add(csvText.Substring(last+1).Trim());
    }

    return tokens.ToArray();
}
public string[]CsvParser(string csvText)
{
列表标记=新列表();
int last=-1;
int电流=0;
bool-inText=false;
while(当前
我看到,如果您在Excel中粘贴csv分隔文本并执行“文本到列”,它会要求您提供“文本限定符”。它默认为双引号,因此它将双引号内的文本视为文本。我设想Excel通过在遇到“文本限定符”时每次执行一个字符来实现这一点,它将继续进入下一个“限定符”。您可能可以使用for循环和布尔值来实现这一点,以表示您是否在文本中

public string[] CsvParser(string csvText)
{
    List<string> tokens = new List<string>();

    int last = -1;
    int current = 0;
    bool inText = false;

    while(current < csvText.Length)
    {
        switch(csvText[current])
        {
            case '"':
                inText = !inText; break;
            case ',':
                if (!inText) 
                {
                    tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ',')); 
                    last = current;
                }
                break;
            default:
                break;
        }
        current++;
    }

    if (last != csvText.Length - 1) 
    {
        tokens.Add(csvText.Substring(last+1).Trim());
    }

    return tokens.ToArray();
}
public string[]CsvParser(string csvText)
{
列表标记=新列表();
int last=-1;
int电流=0;
bool-inText=false;
while(当前
使用Microsoft.VisualBasic.FileIO.TextFieldParser
类。这将处理对带分隔符的文件、
TextReader
Stream
的解析,其中一些字段用引号括起来,而另一些字段则不用引号括起来

例如:

using Microsoft.VisualBasic.FileIO;

string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

TextFieldParser parser = new TextFieldParser(new StringReader(csv));

// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");

string[] fields;

while (!parser.EndOfData)
{
    fields = parser.ReadFields();
    foreach (string field in fields)
    {
        Console.WriteLine(field);
    }
} 

parser.Close();
这将产生以下输出:

2 1016 7/31/2008 14:22 Geoff Dalgas 6/5/2011 22:21 http://stackoverflow.com Corvallis, OR 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34 2. 1016 7/31/2008 14:22 杰夫·达尔加斯 6/5/2011 22:21 http://stackoverflow.com 科瓦利斯,或 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34 有关更多信息,请参阅


您需要在添加引用.NET选项卡中添加对Microsoft.VisualBasic的引用。

使用
Microsoft.VisualBasic.FileIO.TextFieldParser
类。这将处理对分隔文件、
TextReader
Stream
的解析,其中一些字段用引号括起来,而另一些字段不用引号括起来

例如:

using Microsoft.VisualBasic.FileIO;

string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

TextFieldParser parser = new TextFieldParser(new StringReader(csv));

// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");

parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");

string[] fields;

while (!parser.EndOfData)
{
    fields = parser.ReadFields();
    foreach (string field in fields)
    {
        Console.WriteLine(field);
    }
} 

parser.Close();
这将产生以下输出:

2 1016 7/31/2008 14:22 Geoff Dalgas 6/5/2011 22:21 http://stackoverflow.com Corvallis, OR 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34 2. 1016 7/31/2008 14:22 杰夫·达尔加斯 6/5/2011 22:21 http://stackoverflow.com 科瓦利斯,或 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34 有关更多信息,请参阅


您需要在添加引用.NET选项卡中添加对Microsoft.VisualBasic的引用。

当.csv文件可能是逗号分隔字符串、逗号分隔带引号的字符串或两者的混乱组合时,解析.csv文件是一件棘手的事。我提出的解决方案允许这三种可能性中的任何一种。

我创建了一个方法ParseCsvRow()它从csv字符串返回一个数组。我首先处理字符串中的双引号,方法是将双引号上的字符串拆分为一个名为quotesArray的数组。Quoted string.csv文件仅在双引号为偶数时有效。列值中的双引号应替换为一对双引号(这是Excel的方法)。只要.csv文件满足这些要求,就可以预期分隔符逗号仅出现在双引号对的外部。双引号对内部的逗号是列值的一部分,在将.csv拆分为数组时应忽略

我的方法将通过只查看quotesArray的偶数索引来测试双引号对之外的逗号。它还将从列值的开始和结束处删除双引号

    public static string[] ParseCsvRow(string csvrow)
    {
        const string obscureCharacter = "ᖳ";
        if (csvrow.Contains(obscureCharacter)) throw new Exception("Error: csv row may not contain the " + obscureCharacter + " character");

        var unicodeSeparatedString = "";

        var quotesArray = csvrow.Split('"');  // Split string on double quote character
        if (quotesArray.Length > 1)
        {
            for (var i = 0; i < quotesArray.Length; i++)
            {
                // CSV must use double quotes to represent a quote inside a quoted cell
                // Quotes must be paired up
                // Test if a comma lays outside a pair of quotes.  If so, replace the comma with an obscure unicode character
                if (Math.Round(Math.Round((decimal) i/2)*2) == i)
                {
                    var s = quotesArray[i].Trim();
                    switch (s)
                    {
                        case ",":
                            quotesArray[i] = obscureCharacter;  // Change quoted comma seperated string to quoted "obscure character" seperated string
                            break;
                    }
                }
                // Build string and Replace quotes where quotes were expected.
                unicodeSeparatedString += (i > 0 ? "\"" : "") + quotesArray[i].Trim();
            }
        }
        else
        {
            // String does not have any pairs of double quotes.  It should be safe to just replace the commas with the obscure character
            unicodeSeparatedString = csvrow.Replace(",", obscureCharacter);
        }

        var csvRowArray = unicodeSeparatedString.Split(obscureCharacter[0]); 

        for (var i = 0; i < csvRowArray.Length; i++)
        {
            var s = csvRowArray[i].Trim();
            if (s.StartsWith("\"") && s.EndsWith("\""))
            {
                csvRowArray[i] = s.Length > 2 ? s.Substring(1, s.Length - 2) : "";  // Remove start and end quotes.
            }
        }

        return csvRowArray;
    }
publicstaticstring[]ParseCsvRow(stringcsvrow)
{
常量字符串蒙蔽字符=”ᖳ";
如果(csvrow.Contains(
string csv = @"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";

using (var p = ChoCSVReader.LoadText(csv)
    )
{
    Console.WriteLine(p.Dump());
}
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
    /// <summary>
    /// Returns a collection of strings that are derived by splitting the given source string at
    /// characters given by the 'delimiter' parameter.  However, a substring may be enclosed between
    /// pairs of the 'qualifier' character so that instances of the delimiter can be taken as literal
    /// parts of the substring.  The method was originally developed to split comma-separated text
    /// where quotes could be used to qualify text that contains commas that are to be taken as literal
    /// parts of the substring.  For example, the following source:
    ///     A, B, "C, D", E, "F, G"
    /// would be split into 5 substrings:
    ///     A
    ///     B
    ///     C, D
    ///     E
    ///     F, G
    /// When enclosed inside of qualifiers, the literal for the qualifier character may be represented
    /// by two consecutive qualifiers.  The two consecutive qualifiers are distinguished from a closing
    /// qualifier character.  For example, the following source:
    ///     A, "B, ""C"""
    /// would be split into 2 substrings:
    ///     A
    ///     B, "C"
    /// </summary>
    /// <remarks>Originally based on: https://stackoverflow.com/a/43284485/2998072</remarks>
    /// <param name="source">The string that is to be split</param>
    /// <param name="delimiter">The character that separates the substrings</param>
    /// <param name="qualifier">The character that is used (in pairs) to enclose a substring</param>
    /// <param name="toTrim">If true, then whitespace is removed from the beginning and end of each
    /// substring.  If false, then whitespace is preserved at the beginning and end of each substring.
    /// </param>
    public static List<String> SplitQualified(this String source, Char delimiter, Char qualifier,
                                Boolean toTrim)
    {
        // Avoid throwing exception if the source is null
        if (String.IsNullOrEmpty(source))
            return new List<String> { "" };

        var results = new List<String>();
        var result = new StringBuilder();
        Boolean inQualifier = false;

        // The algorithm is designed to expect a delimiter at the end of each substring, but the
        // expectation of the caller is that the final substring is not terminated by delimiter.
        // Therefore, we add an artificial delimiter at the end before looping through the source string.
        String sourceX = source + delimiter;

        // Loop through each character of the source
        for (var idx = 0; idx < sourceX.Length; idx++)
        {
            // If current character is a delimiter
            // (except if we're inside of qualifiers, we ignore the delimiter)
            if (sourceX[idx] == delimiter && inQualifier == false)
            {
                // Terminate the current substring by adding it to the collection
                // (trim if specified by the method parameter)
                results.Add(toTrim ? result.ToString().Trim() : result.ToString());
                result.Clear();
            }
            // If current character is a qualifier
            else if (sourceX[idx] == qualifier)
            {
                // ...and we're already inside of qualifier
                if (inQualifier)
                {
                    // check for double-qualifiers, which is escape code for a single
                    // literal qualifier character.
                    if (idx + 1 < sourceX.Length && sourceX[idx + 1] == qualifier)
                    {
                        idx++;
                        result.Append(sourceX[idx]);
                        continue;
                    }
                    // Since we found only a single qualifier, that means that we've
                    // found the end of the enclosing qualifiers.
                    inQualifier = false;
                    continue;
                }
                else
                    // ...we found an opening qualifier
                    inQualifier = true;
            }
            // If current character is neither qualifier nor delimiter
            else
                result.Append(sourceX[idx]);
        }

        return results;
    }
    [TestMethod()]
    public void SplitQualified_00()
    {
        // Example with no substrings
        String s = "";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_00A()
    {
        // just a single delimiter
        String s = ",";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "", "" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_01()
    {
        // Example with no whitespace or qualifiers
        String s = "1,2,3,1,2,3";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_02()
    {
        // Example with whitespace and no qualifiers
        String s = " 1, 2 ,3,  1  ,2\t,   3   ";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_03()
    {
        // Example with whitespace and no qualifiers
        String s = " 1, 2 ,3,  1  ,2\t,   3   ";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(
            new List<String> { " 1", " 2 ", "3", "  1  ", "2\t", "   3   " },
            substrings);
    }
    [TestMethod()]
    public void SplitQualified_04()
    {
        // Example with no whitespace and trivial qualifiers.
        String s = "1,\"2\",3,1,2,\"3\"";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);

        s = "\"1\",\"2\",3,1,\"2\",3";
        substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_05()
    {
        // Example with no whitespace and qualifiers that enclose delimiters
        String s = "1,\"2,2a\",3,1,2,\"3,3a\"";
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2,2a", "3", "1", "2", "3,3a" },
                                substrings);

        s = "\"1,1a\",\"2,2b\",3,1,\"2,2c\",3";
        substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1,1a", "2,2b", "3", "1", "2,2c", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_06()
    {
        // Example with qualifiers enclosing whitespace but no delimiter
        String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_07()
    {
        // Example with qualifiers enclosing whitespace but no delimiter
        String s = "\" 1 \",\"2 \",3,1,2,\"\t3\t\"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", "2 ", "3", "1", "2", "\t3\t" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_08()
    {
        // Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 \"  ,  3,1, 2 ,\"  3  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_09()
    {
        // Example with qualifiers enclosing whitespace but no delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 \"  ,  3,1, 2 ,\"  3  \"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2   ", "  3", "1", " 2 ", "  3  " },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_10()
    {
        // Example with qualifiers enclosing whitespace and delimiter
        String s = "\" 1 \",\"2 , 2b \",3,1,2,\"  3,3c  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2 , 2b", "3", "1", "2", "3,3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_11()
    {
        // Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
        String s = "\" 1 \", \"2 , 2b \"  ,  3,1, 2 ,\"  3,3c  \"";
        // whitespace should be preserved
        var substrings = s.SplitQualified(',', '"', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 , 2b   ", "  3", "1", " 2 ", "  3,3c  " },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_12()
    {
        // Example with tab characters between delimiters
        String s = "\t1,\t2\t,3,1,\t2\t,\t3\t";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_13()
    {
        // Example with newline characters between delimiters
        String s = "\n1,\n2\n,3,1,\n2\n,\n3\n";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_14()
    {
        // Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
        String s = "\" 1 \",\"\"\"2 , 2b \"\"\",3,1,2,\"  \"\"3,3c  \"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "\"2 , 2b \"", "3", "1", "2", "\"3,3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_14A()
    {
        // Example with qualifiers enclosing whitespace and delimiter, plus escaped qualifier
        String s = "\"\"\"1\"\"\"";
        // whitespace should be removed
        var substrings = s.SplitQualified(',', '"', true);
        CollectionAssert.AreEquivalent(new List<String> { "\"1\"" },
                                substrings);
    }


    [TestMethod()]
    public void SplitQualified_15()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with no whitespace or qualifiers
        String s = "1|2|3|1|2,2f|3";
        var substrings = s.SplitQualified('|', '#', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2", "3", "1", "2,2f", "3" }, substrings);
    }
    [TestMethod()]
    public void SplitQualified_16()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with qualifiers enclosing whitespace and delimiter
        String s = "# 1 #|#2 | 2b #|3|1|2|#  3|3c  #";
        // whitespace should be removed
        var substrings = s.SplitQualified('|', '#', true);
        CollectionAssert.AreEquivalent(new List<String> { "1", "2 | 2b", "3", "1", "2", "3|3c" },
                                substrings);
    }
    [TestMethod()]
    public void SplitQualified_17()
    {
        // Instead of comma-delimited and quote-qualified, use pipe and hash

        // Example with qualifiers enclosing whitespace and delimiter; also whitespace btwn delimiters
        String s = "# 1 #| #2 | 2b #  |  3|1| 2 |#  3|3c  #";
        // whitespace should be preserved
        var substrings = s.SplitQualified('|', '#', false);
        CollectionAssert.AreEquivalent(new List<String> { " 1 ", " 2 | 2b   ", "  3", "1", " 2 ", "  3|3c  " },
                                substrings);
    }