C# 文本解析-我的解析器跳过命令
我正在尝试解析文本格式。我想用反勾(C# 文本解析-我的解析器跳过命令,c#,text-parsing,C#,Text Parsing,我正在尝试解析文本格式。我想用反勾(`)标记内联代码,就像这样。规则应该是,如果要在内联代码元素内部使用反勾号,则应该在内联代码周围使用双反勾号 像这样: ``用反勾(`)标记内联代码`` 出于某种原因,我的解析器似乎完全跳过了双倒勾。下面是执行内联代码解析的函数的代码: private string ParseInlineCode(string input) { for (int i = 0; i < input.Length; i++)
`
)标记内联代码,就像这样。规则应该是,如果要在内联代码元素内部使用反勾号,则应该在内联代码周围使用双反勾号
像这样:
``用反勾(`)标记内联代码``
出于某种原因,我的解析器似乎完全跳过了双倒勾。下面是执行内联代码解析的函数的代码:
private string ParseInlineCode(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '`' && input[i - 1] != '\\')
{
if (input[i + 1] == '`')
{
string str = ReadToCharacter('`', i + 2, input);
while (input[i + str.Length + 2] != '`')
{
str += ReadToCharacter('`', i + str.Length + 3, input);
}
string tbr = "``" + str + "``";
str = str.Replace("&", "&");
str = str.Replace("<", "<");
str = str.Replace(">", ">");
input = input.Replace(tbr, "<code>" + str + "</code>");
i += str.Length + 13;
}
else
{
string str = ReadToCharacter('`', i + 1, input);
input = input.Replace("`" + str + "`", "<code>" + str + "</code>");
i += str.Length + 13;
}
}
}
return input;
}
如果我在某物周围使用单回标记,它会正确地将其包装在
标记中 这里有一个在LinqPad中测试过的小片段,可以帮助您入门
void Main()
{
string test = "here is some code `public void Method( )` but ``this is not code``";
Regex r = new Regex( @"(`[^`]+`)" );
MatchCollection matches = r.Matches( test );
foreach( Match match in matches )
{
Console.Out.WriteLine( match.Value );
if( test[match.Index - 1] == '`' )
Console.Out.WriteLine( "NOT CODE" );
else
Console.Out.WriteLine( "CODE" );
}
}
输出:
`public void Method( )`
CODE
`this is not code`
NOT CODE
下面是一个在LinqPad中测试的小片段,可以帮助您开始
void Main()
{
string test = "here is some code `public void Method( )` but ``this is not code``";
Regex r = new Regex( @"(`[^`]+`)" );
MatchCollection matches = r.Matches( test );
foreach( Match match in matches )
{
Console.Out.WriteLine( match.Value );
if( test[match.Index - 1] == '`' )
Console.Out.WriteLine( "NOT CODE" );
else
Console.Out.WriteLine( "CODE" );
}
}
输出:
`public void Method( )`
CODE
`this is not code`
NOT CODE
在while
循环中
while (input[i + str.Length + 2] != '`')
{
str += ReadToCharacter('`', i + str.Length + 3, input);
}
你看了错误的索引-i+str.Length+2
,而不是i+str.Length+3
——然后你必须在正文中添加反勾号。应该是的
while (input[i + str.Length + 3] != '`')
{
str += '`' + ReadToCharacter('`', i + str.Length + 3, input);
}
但是你的代码中还有一些错误。如果输入的第一个字符是反勾号,则下一行将导致索引自动失效异常
if (input[i] == '`' && input[i - 1] != '\\')
if (input[i + 1] == '`')
如果输入包含奇数个分开的反勾号,并且输入的最后一个字符是反勾号,则下一行将导致索引自动失效异常
if (input[i] == '`' && input[i - 1] != '\\')
if (input[i + 1] == '`')
您可能应该将代码反射到更小的方法中,而不是在单个方法中处理许多情况——这很容易出现bug。如果您还没有为代码编写单元测试,我强烈建议您这样做。由于解析器并不容易测试,因为您必须为各种无效输入做好准备,您可以看看——一种通过分析所有分支点并尝试采用所有可能的代码路径来自动生成代码测试用例的工具
我很快启动了PEX,并针对代码运行它-它找到了我想到的索引异常
,还有更多。当然,如果输入是空引用,PEX会发现明显的NullReferenceExceptions
。以下是PEX发现的导致异常的输入
case1 = "`"
case2 = "\0`"
case3 = "\0``"
case4 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0````"
case5 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`"
case6 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``<\0\0```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0`\0```````````````"
case1=“`”
案例2=“\0”
案例3=“\0``”
案例4=“\0`\0```\u0001````\0\0\0\0\0\0\0\0\0```”
案例5=“\0`\0`````\u0001`````\0\0\0\0\0\0\0\0````\0\0\0\0```\0\0\0\0\0``````\0\0\0\0\0\0\0\0\0\0`”
案例6=“\0`\0````\u0001`````\0\0\0\0\0\0\0\0`````````p>在的循环中
while (input[i + str.Length + 2] != '`')
{
str += ReadToCharacter('`', i + str.Length + 3, input);
}
你看了错误的索引-i+str.Length+2
,而不是i+str.Length+3
,然后你必须在正文中添加反勾号。它可能是
while (input[i + str.Length + 3] != '`')
{
str += '`' + ReadToCharacter('`', i + str.Length + 3, input);
}
但是您的代码中还有一些错误。如果输入的第一个字符是反勾号,那么下面的一行将导致一个IndexOutOfRangeException
if (input[i] == '`' && input[i - 1] != '\\')
if (input[i + 1] == '`')
如果输入包含奇数个分开的反勾号,并且输入的最后一个字符是反勾号,则下一行将导致索引自动失效异常
if (input[i] == '`' && input[i - 1] != '\\')
if (input[i + 1] == '`')
您可能应该将代码反射到更小的方法中,而不是在一个方法中处理许多情况,这很容易出现错误。如果您没有为代码编写单元测试,我强烈建议您这样做。而且由于解析器不太容易测试,因为各种无效输入,您必须为您可能的错误做好准备看看-一个通过分析所有分支点并尝试采用所有可能的代码路径来自动生成代码测试用例的工具
我很快启动了PEX并针对代码运行它-它找到了我想到的索引自动失效异常
以及其他一些。当然,如果输入是空引用,PEX会发现明显的NullReferenceExceptions
。以下是PEX发现的导致异常的输入
case1 = "`"
case2 = "\0`"
case3 = "\0``"
case4 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0````"
case5 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0`"
case6 = "\0`\0````````````\u0001``````````````\0\0\0\0\0\0\0\0\0\0\0```\0````````````\0\0\0\0\0\0\0\0\0\0``<\0\0`````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0\0``<\0\0```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````\0\0\0\0\0\0\0\0\0`\0```````````````"
case1=“`”
案例2=“\0”
案例3=“\0``”
案例4=“\0`\0```\u0001````\0\0\0\0\0\0\0\0\0```”
案例5=“\0`\0`````\u0001`````\0\0\0\0\0\0\0\0````\0\0\0\0```\0\0\0\0\0``````\0\0\0\0\0\0\0\0\0\0`”
case6=“\0`\0````\u0001```\0\0\0\0\0\0\0````````\0\0\0```````\0\0\0\0'``````\0\0\0\0\0\0\0\0\0````正则表达式对这项工作不是更好吗?我想你把反勾和单引号混淆了,我输入了单引号,修正了。我想你把反勾号和单引号混淆了。我输入了单引号,修正了。