Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/jpa/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 指定与正则表达式中的内容不匹配_C#_Regex - Fatal编程技术网

C# 指定与正则表达式中的内容不匹配

C# 指定与正则表达式中的内容不匹配,c#,regex,C#,Regex,我的输入具有嵌套的括号,其中包含节点中的类似节点。样本数据: (S (S (NP (PRP It)) (VP (VP (VBZ has) (VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (

我的输入具有嵌套的括号,其中包含节点中的类似节点。样本数据:

(S (S (NP (PRP It)) (VP (VP (VBZ has) (VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (NP (NN government)) (PP (IN of) (NP (NNP Punjab))))) (PP (IN on) (NP (DT the) (NN internet) (NNS services))))))))) (CC and) (VP (ADVP (RB just)) (VBP like) (NP (JJ YouTube) (NN ban)) (PRT (RP back)) (PP (IN in) (NP (CD 2012)))))) (, ,) (NP (NP (DT this) (NN internet) (NN tax)) (PP (IN in) (NP (NP (DT the) (JJS largest) (NN province)) (PP (IN of) (NP (NNP Pakistan)))))) (VP (VBZ does) (RB n't) (VP (VB seem) (S (VP (TO to) (VP (VB see) (NP (PRP$ its) (NN end)) (PP (ADVP (NP (DT any) (NN time)) (RB soon)) (IN despite) (NP (NP (JJ various) (JJ verbal) (NN commitment)) (PP (IN by) (NP (JJ Chief) (NNP Minister) (NNP Shahbaz) (NNP Sharif) (CC and) (JJ Provincial) (NN Finance) (NNP Minister) (NNP Ayesha) (NNPS Ghaus) (NNP Pasha))))) (PP (IN with) (NP (PDT all) (DT the) (NNS stakeholders)))))))) (. .))
(S (NP (JJ Inside) (NNS reports)) (VP (VBD confirmed) (SBAR (IN that) (S (NP (NP (DT a) (NN camp) (NN lead)) (PP (IN by) (NP (NP (NNP Chairman) (NNP Punjab) (NNP Revenue) (NNP Authority)) (PRN (-LRB- -LRB-) (NP (NNP PRA)) (-RRB- -RRB-))))) (VP (VBZ is) (NP (DT no) (NN mood) (S (VP (TO to) (VP (VB revoke) (NP (DT the) (NN tax)) (SBAR (IN that) (S (NP (PRP it)) (VP (VBD implemented) (PP (IN on) (NP (NNP May) (CD 28) (, ,) (CD 2014)))))))))))))) (. .))
(S (PP (IN On) (NP (DT the) (JJ other) (NN hand))) (, ,) (NP (NP (NNS people)) (PP (IN like) (NP (NP (NNP Chairman) (NNP PITB)) (, ,) (NP (NNP Umar) (NNP Saif)) (, ,)))) (VP (VBZ has) (VP (VBN been) (VP (VBG speaking) (PP (IN against) (NP (JJ such) (JJ anti-technology) (NN tax)))))) (. .))
(S (PP (IN In) (NP (PRP$ his) (JJ various) (NN statement))) (, ,) (NP (PRP he)) (VP (VBZ has) (VP (VBN shown) (NP (NN hope)) (SBAR (IN that) (S (NP (NNS things)) (VP (MD would) (VP (VB get) (NP (QP (JJR better) (IN but) (DT no)) (NNS signs)) (ADVP (RB yet)))))))) (. .))
我感兴趣的是匹配VP节点,其中碰巧有多个其他VP节点。所以我想匹配最后一个VP节点。我的正则表达式:

\(VP\s*((?!VP)|[^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))\)
我想匹配的内容,例如,在第1行中,我只想匹配最内部的VP:

(VP (VBN been) (PP (IN over) (NP (NN half))) (NP (DT a) (NN year)) (SBAR (IN since) (S (NP (DT a) (ADJP (CD 19.5) (NN %)) (NN tax)) (VP (VBD was) (VP (VBN imposed) (PP (IN by) (NP (NP (NN government)) (PP (IN of) (NP (NNP Punjab))))) (PP (IN on) (NP (DT the) (NN internet) (NNS services))))))))) (CC and) (VP (ADVP (RB just)) (VBP like) (NP (JJ YouTube) (NN ban)) (PRT (RP back)) (PP (IN in) (NP (CD 2012)))
因此,忽略两个级别
(VP(VP(VBZ有)
)。
知道如何使用嵌套括号匹配指定负匹配组吗?

您只需要找到
VP
的最后一个实例作为起点。正则表达式的其余部分可以忽略它是否包含
VP

\(VP\s+(?!.*\(VP\s)([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))\)


\(VP\s+          # Look for "(VP "
(?!.*\(VP\s)     # Must not be followed by "(VP "
([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))    # Arbitrary nested content
\)               # Closing parenthesis for "(VP ..."
\(VP\s+(?!!.\(VP\s)([^()]+)+(?\())+(?(级别)(?!)\)
\(VP\s+#查找“(VP”
(?!.\(VP\s)#后面不能跟“(VP”
([^()]+|(?\())+(?\)+(?(级别)(?!)#任意嵌套内容
\)#结束括号中的“(VP…”

您只需要找到
VP
的最后一个实例作为起点。其余正则表达式可以忽略它是否包含
VP

\(VP\s+(?!.*\(VP\s)([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))\)


\(VP\s+          # Look for "(VP "
(?!.*\(VP\s)     # Must not be followed by "(VP "
([^()]+|(?<Level>\()|(?<-Level>\)))+(?(Level)(?!))    # Arbitrary nested content
\)               # Closing parenthesis for "(VP ..."
\(VP\s+(?!!.\(VP\s)([^()]+)+(?\())+(?(级别)(?!)\)
\(VP\s+#查找“(VP”
(?!.\(VP\s)#后面不能跟“(VP”
([^()]+|(?\())+(?\)+(?(级别)(?!)#任意嵌套内容
\)#结束括号中的“(VP…”

我不认为正则表达式是解析递归结构的合适工具。它看起来像是一种Lisp方言,也许你可以通过寻找或编写一个Lisp解析器/求值器来获得更多的运气?如果你只需要避免匹配以
开头的子字符串(VP(VP…
用一个捕获组括住你的模式)(删除“仅前瞻”选项),并在模式前面加上
\(VP\s+\(VP\b |
),然后仅获取捕获组1的值。此外,您的预期值包含数量不平衡的
()
。我不认为正则表达式是解析递归结构的正确工具。它看起来像是一种Lisp方言,如果您只需要避免匹配以
(VP(VP…
开头的子字符串,请使用捕获组将您的模式括起来,这样您可能会更幸运(删除仅前瞻选项),并在模式前面加上
\(VP\s+\(VP\b |
),然后仅获取捕获组1的值。此外,您的预期值包含数量不平衡的
()
。。