C# 正则表达式,用于检查字符串是否在可能包含c中嵌套括号的特定模式内#
我一直在尝试编写一个代码来检查给定字符串是否包含具有特定模式的特定字符串。 准确地说,例如:C# 正则表达式,用于检查字符串是否在可能包含c中嵌套括号的特定模式内#,c#,regex,C#,Regex,我一直在尝试编写一个代码来检查给定字符串是否包含具有特定模式的特定字符串。 准确地说,例如: string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman" List<string> checkList = new List<string>{"homo sapiens","human","man","woman"}; 从上面的列表中,它们遵循模式,即字符串后跟~或括号内以~开头的一个
string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman"
List<string> checkList = new List<string>{"homo sapiens","human","man","woman"};
从上面的列表中,它们遵循模式,即字符串后跟~或括号内以~开头的一个字符串。
到目前为止,我提出了:
string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman"
List<string> checkList = new List<string>{"homo sapiens","human","man","woman"};
var prunedList = new List<string>();
foreach(var term in checkList)
{
var pattern = @"~(\s)*(\(\s*)?(\(?\w\s*\)?)*" + term + @"(\s*\))?";
Match m = Regex.Match(mainString, pattern);
if(m.success)
{
prunedList.Add(term);
}
}
string=@“~(智人的意思是(人类))或男人或女人”
清单清单=新清单{“智人”、“人类”、“男人”、“女人”};
var prunedList=新列表();
foreach(检查表中的var术语)
{
var模式=@“~(s)*”(\(\s*)?(\(?\w\s*)?)*”+术语+@”(\s*))?”;
匹配m=正则表达式匹配(主字符串、模式);
如果(m.成功)
{
删减列表。添加(术语);
}
}
但这种模式并不适用于所有情况。。。
有人能告诉我怎么做吗?使用正则表达式是不可能的。
您应该放弃使用它们的想法,而使用常规的字符串操作,比如
IndexOf
,我编写了一个简单的解析器,它对于您给出的示例非常有效
我不知道以这种模式结尾的字符串的预期行为是什么:~(一些单词
(即,没有带有效开头的右括号)
我相信你可以清理一下这个
private bool Contains(string source, string given)
{
return ExtractValidPhrases(source).Any(p => RegexMatch(p, given));
}
private bool RegexMatch(string phrase, string given)
{
return Regex.IsMatch(phrase, string.Format(@"\b{0}\b", given), RegexOptions.IgnoreCase);
}
private IEnumerable<string> ExtractValidPhrases(string source)
{
bool valid = false;
var parentheses = new Stack<char>();
var phrase = new StringBuilder();
for(int i = 0; i < source.Length; i++)
{
if (valid) phrase.Append(source[i]);
switch (source[i])
{
case '~':
valid = true;
break;
case ' ':
if (valid && parentheses.Count == 0)
{
yield return phrase.ToString();
phrase.Clear();
}
if (parentheses.Count == 0) valid = false;
break;
case '(':
if (valid)
{
parentheses.Push('(');
}
break;
case ')':
if (valid)
{
parentheses.Pop();
}
break;
}
}
//if (valid && !parentheses.Any()) yield return phrase.ToString();
if (valid) yield return phrase.ToString();
}
private bool Contains(字符串源,字符串给定)
{
返回ExtractValidPhrases(source).Any(p=>RegexMatch(p,给定));
}
私有bool RegexMatch(字符串短语,给定字符串)
{
返回Regex.IsMatch(phrase,string.Format(@“\b{0}\b”,给定),RegexOptions.IgnoreCase);
}
私有IEnumerable ExtractValid短语(字符串源)
{
bool valid=false;
var括号=新堆栈();
var phrase=新的StringBuilder();
for(int i=0;i
以下是我使用的测试:
// NUnit tests
[Test]
[TestCase("Homo Sapiens", true)]
[TestCase("human", true)]
[TestCase("woman", true)]
[TestCase("man", false)]
public void X(string given, bool shouldBeFound)
{
const string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman";
Assert.AreEqual(shouldBeFound, Contains(mainString, given));
}
[Test]
public void Y()
{
const string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman";
var checkList = new List<string> {"homo sapiens", "human", "man", "woman"};
var expected = new List<string> { "homo sapiens", "human", "woman" };
var filtered = checkList.Where(s => Contains(mainString, s));
CollectionAssert.AreEquivalent(expected, filtered);
}
//NUnit测试
[测试]
[测试用例(“智人”,真)]
[测试用例(“人”,真)]
[测试案例(“女性”,真实)]
[测试用例(“人”,假)]
public void X(给定字符串,bool应该找到)
{
const string mainString=@“~(智人的意思是(人类))或男人或女人”;
AreEqual(shouldBeFound,Contains(mainString,given));
}
[测试]
公共图书馆(
{
const string mainString=@“~(智人的意思是(人类))或男人或女人”;
var清单=新名单{“智人”、“人类”、“男人”、“女人”};
var expected=新列表{“智人”、“人”、“女人”};
var筛选=检查表。其中(s=>包含(主字符串,s));
CollectionAssert.AreEquivalent(预期、过滤);
}
偏执检查是一种需要堆栈进行检查的逻辑或语法。正则表达式适用于。它们没有内存,因此不能用于此类目的
要检查这一点,您需要扫描字符串并计算括号:
- 将计数初始化为0
- 扫描字符串
- 如果当前字符为
,则递增(
计数
- 如果当前字符为
则减量)
计数
- 如果
为负数,则引发括号不一致的错误;例如,count
)(
- 如果当前字符为
- 最后,如果
为正,则存在一些未闭合的括号count
- 如果
为零,则测试通过count
publicstaticbool检查圆括号(字符串输入)
{
整数计数=0;
foreach(输入中的var ch)
{
如果(ch=='(')计数++;
如果(ch=''))计数--;
//如果圆括号在未打开的情况下关闭,则返回false
如果(计数<0)
返回false;
}
//最后,只有当计数为零时,测试才通过
返回计数==0;
}
你看,既然正则表达式不能计数,那么它们就不能检查这样的模式。平衡括号的语言不是正则的,因此你不能用正则表达式完成你想要的。更好的方法是使用传统的字符串解析和两个计数器——一个用于open paren,一个用于c丢失parens或堆栈以创建类似于下推自动机的模型 要更好地了解这个概念,请查看维基百科上的PDA 下面是一个使用堆栈从最外面的paren(伪代码)中获取字符串的示例
Stack Stack=新堆栈();
char[]stringToParse=originalString.toCharArray();
对于(int i=0;i
当然,这是一个人为的例子,需要做一些工作来进行更认真的解析,但它给出了如何进行解析的基本思路。我省略了以下内容:正确的函数名(现在不想查找它们),如何在嵌套的paren中获取文本,比如从字符串中提取“内部”(外部(内部))”(该函数将返回“out”
// NUnit tests
[Test]
[TestCase("Homo Sapiens", true)]
[TestCase("human", true)]
[TestCase("woman", true)]
[TestCase("man", false)]
public void X(string given, bool shouldBeFound)
{
const string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman";
Assert.AreEqual(shouldBeFound, Contains(mainString, given));
}
[Test]
public void Y()
{
const string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman";
var checkList = new List<string> {"homo sapiens", "human", "man", "woman"};
var expected = new List<string> { "homo sapiens", "human", "woman" };
var filtered = checkList.Where(s => Contains(mainString, s));
CollectionAssert.AreEquivalent(expected, filtered);
}
public static bool CheckParentheses(string input)
{
int count = 0;
foreach (var ch in input)
{
if (ch == '(') count++;
if (ch == ')') count--;
// if a parenthesis is closed without being opened return false
if(count < 0)
return false;
}
// in the end the test is passed only if count is zero
return count == 0;
}
Stack stack = new Stack();
char[] stringToParse = originalString.toCharArray();
for (int i = 0; i < stringToParse.Length; i++)
{
if (stringToParse[i] == '(')
stack.push(i);
if (stringToParse[i] == ')')
string StringBetweenParens = originalString.GetSubstring(stack.pop(), i);
}
string mainString = @"~(Homo Sapiens means (human being)) or man or ~woman";
List<string> checkList = new List<string> { "homo sapiens", "human", "man", "woman" };
// build subpattern "(?:homo sapiens|human|man|woman)"
string searchAlternation = "(?:" + String.Join("|", checkList.ToArray()) + ")";
MatchCollection matches = Regex.Matches(
mainString,
@"(?<=~|(?(Depth)(?!))~[(](?>[^()]+|(?<-Depth>)?[(]|(?<Depth>[)]))*)"+searchAlternation,
RegexOptions.IgnoreCase
);
(?<= # lookbehind
~ # if there is a literal ~ to the left of our string, we're good
| # OR
(?(Depth)(?!)) # if there is something left on the stack, we started outside
# of the parentheses that end end "~("
~[(] # match a literal ~(
(?> # subpattern to analyze parentheses. the > makes the group
# atomic, i.e. suppresses backtracking. Note: we can only do
# this, because the three alternatives are mutually exclusive
[^()]+ # consume any non-parens characters without caring about them
| # OR
(?<-Depth>)? # pop the top of stack IF possible. the last ? is necessary for
# like "human" where we start with a ( before there was a )
# which could be popped.
[(] # match a literal (
| # OR
(?<Depth>[)]) # match a literal ) and push it onto the stack
)* # repeat for as long as possible
) # end of lookbehind
(?:homo sapiens|human|man|woman)
# match one of the words in the check list