C# 捕获模式，但在引号中忽略它_C#_Regex

C# 捕获模式，但在引号中忽略它

c# regex

C# 捕获模式，但在引号中忽略它,c#,regex,C#,Regex,因此，在c#regex中，我需要做的基本上是在找到某个模式时拆分一个字符串，但如果该模式被字符串中的双引号包围，则忽略该模式例如： string text = "abc , def , a\" , \"d , oioi"; string pattern = "[ \t]*,[ \t]*"; string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript); 拆分后需要的结果（3个拆分，4个字符串）：实际结果（4个拆

因此，在c#regex中，我需要做的基本上是在找到某个模式时拆分一个字符串，但如果该模式被字符串中的双引号包围，则忽略该模式

例如：

string text = "abc , def , a\" , \"d , oioi";
string pattern = "[ \t]*,[ \t]*";

string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);

拆分后需要的结果（3个拆分，4个字符串）：

实际结果（4个拆分，5个字符串）：

另一个例子：

string text = "a%2% 6y % \"ad%t6%&\" %(7y) %";
string pattern = "%";

string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);

拆分后需要的结果（5个拆分，6个字符串）：

实际结果（7个拆分，8个字符串）：

第三个例子，举例说明只有第一种情况应被忽略的棘手拆分：

string text = "!!\"!!\"!!\"";
string pattern = "!!";

string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);

拆分后需要的结果（2个拆分，3个字符串）：

实际结果（3个拆分，4个字符串）：

那么，我如何从一种模式转变为一种新的模式，以达到预期的结果呢

旁注：如果你要将某人的问题标记为重复的（我对此没有异议），至少要让他们找到正确的答案，而不是随机发布的帖子（是的，我在看你，阿维纳什·拉吉先生）.

规则或多或少类似于csv行，除了：

分隔符可以是单个字符，但也可以是字符串或模式（在最后这些情况下，如果项目以模式分隔符的最后一个或第一个可能标记开始或结束，则必须修剪项目）
最后一项允许使用孤立报价

首先，当您想要使用一些高级规则来分隔项目（要拆分）时，拆分方法不再是一个好的选择。拆分方法仅适用于简单情况，而不适用于您的情况。（即使没有孤立引号，使用带

，（？=（？：[^“]*”[^“]*”[^“]*”*[^“]*$）

的拆分也是一个非常糟糕的主意，因为解析字符串所需的步骤数量随着字符串大小呈指数增长。）

另一种方法是捕获项目。这更简单、更快。（另外：它同时检查整个字符串的格式）

下面是一个一般的方法：

^
(?>
  (?:delimiter | start_of_the_string)
  (
      simple_part
      (?>
          (?: quotes | delim_first_letter_1 | delim_first_letter_2 | etc. )
          simple_part
      )*
  )
)+
$

以

\s*，\s*

作为分隔符的示例：

^
# non-capturing group for one delimiter and one item
(?>
    (?: \s*,\s* | ^ ) # delimiter or start of the string
                      # (eventually change "^" to "^ \s*" to trim the first item)

    # capture group 1 for the item 
    (   # simple part of the item (maybe empty):
        [^\s,"]* # all that is not the quote character or one of the  possible first
                 # character of the delimiter
        # edge case followed by a simple part
        (?>
            (?: # edge cases
                " [^"]* (?:"|$) # a quoted part or an orphan quote in the last item (*)
              |   # OR
                (?> \s+ ) # start of the delimiter
                (?!,)     # but not the delimiter
            )

            [^\s,"]* # simple part
        )*
    )
)+
$

（单击表格链接）

该模式是为

Regex.Match

方法设计的，因为它描述了所有字符串。所有项目都可以在组1中使用，因为.net Regex风格能够存储重复的捕获组

这个例子可以很容易地适用于所有情况

（*）如果你想在引用的部分中允许转义引号，你可以再使用一次

simple\u part（？：edge\u case simple\u part）*

而不是

“[^”]*（？：“\$）

，
即：“[^\\\\”]*（？：“\\.\\\\\\\\”]*）*（？：“\$）
我认为这是一个分为两步的过程，试图让它成为一步的过程已经被过度考虑了

步骤
只需从字符串中删除任何引号
在目标角色上拆分
过程示例
我将在步骤2的，
上拆分
var data = string.Format("abc , def , a{0}, {0}d , oioi", "\"");

 // `\x22` is hex for a quote (") which for easier reading in C# editing.
var stage1 = Regex.Replace(data, @"\x22", string.Empty);

// abc , def , a", "d , oioi
// becomes
// abc , def , a, d , oioi

Regex.Matches(stage1, @"([^\s,]+)[\s,]*")
     .OfType<Match>()
     .Select(mt => mt.Groups[1].Value )

var data=string.Format（“abc，def，a{0}，{0}d，oioi”，“\”）；
//`\x22`是引号（“）的十六进制，在C#编辑中更容易阅读。
var stage1=Regex.Replace（数据@“\x22”，string.Empty）；
//abc、def、a、d、OI
//变成
//abc、def、a、d、OI
Regex.Matches（stage1，@“（[^\s，]+）[\s，]*”）
第（）类
.Select（mt=>mt.Groups[1]。值）

结果
您是否删除了以前的问题？您已重新发布了与作为的副本关闭的问题相同的问题。如果你认为你的问题不是这个问题的重复，请在问题正文中解释。@AvinashRaj。是的，我确实删除了它，因为再多的编辑都不会“不重复”它。如第三个示例所示，中提供的答案（与相同）将不起作用。
string text = "!!\"!!\"!!\"";
string pattern = "!!";

string[] result = Regex.Split(text, pattern, RegexOptions.ECMAScript);

    {"",
     "\"!!\"",
     "\""}

    {"",
     "\"",
     "\"",
     "\"",}

^
(?>
  (?:delimiter | start_of_the_string)
  (
      simple_part
      (?>
          (?: quotes | delim_first_letter_1 | delim_first_letter_2 | etc. )
          simple_part
      )*
  )
)+
$

^
# non-capturing group for one delimiter and one item
(?>
    (?: \s*,\s* | ^ ) # delimiter or start of the string
                      # (eventually change "^" to "^ \s*" to trim the first item)

    # capture group 1 for the item 
    (   # simple part of the item (maybe empty):
        [^\s,"]* # all that is not the quote character or one of the  possible first
                 # character of the delimiter
        # edge case followed by a simple part
        (?>
            (?: # edge cases
                " [^"]* (?:"|$) # a quoted part or an orphan quote in the last item (*)
              |   # OR
                (?> \s+ ) # start of the delimiter
                (?!,)     # but not the delimiter
            )

            [^\s,"]* # simple part
        )*
    )
)+
$

var data = string.Format("abc , def , a{0}, {0}d , oioi", "\"");

 // `\x22` is hex for a quote (") which for easier reading in C# editing.
var stage1 = Regex.Replace(data, @"\x22", string.Empty);

// abc , def , a", "d , oioi
// becomes
// abc , def , a, d , oioi

Regex.Matches(stage1, @"([^\s,]+)[\s,]*")
     .OfType<Match>()
     .Select(mt => mt.Groups[1].Value )