R中的负向后看问题_R_Regex_Regex Lookarounds

R中的负向后看问题

r regex

R中的负向后看问题,r,regex,regex-lookarounds,R,Regex,Regex Lookarounds,我有一组句子： w <- c("so i said er well it would n't surprise me if it could bloody talk", # quote marker "we got fifteen, well thirteen minutes", "well she brought a pie and she br

我有一组句子：

w <- c("so i said er well it would n't surprise me if it could bloody talk",  # quote marker
        "we got fifteen, well thirteen minutes",                              
        "well she brought a pie and she brought some er punch round",         
        "so your dad said well have n't i been soft ?",                       # quote marker
        "And he went [pause] well I can't feel any. ",                        # quote marker
        "I goes well they'll improve the grant to start off with",            # quote marker
        "so with the chips as well this is about one sixty .",                
        "well we 're not all the same are we , but")

grep("(?<=said|goes|went).*well", w, value = T, perl = T)
[1] "so i said er well it would n't surprise me if it could bloody talk"
[2] "so your dad said well have n't i been soft ?"                      
[3] "And he went [pause] well I can't feel any. "                       
[4] "I goes well they'll improve the grant to start off with"

我遇到的问题是负数向后查找以匹配“well”为非的字符串引号标记不起作用。例如，这匹配所有内容：

grep("(?<!said|goes|went).*well", w, value = T, perl = T)
[1] "so i said er well it would n't surprise me if it could bloody talk" # not match
[2] "we got fifteen, well thirteen minutes"                              # match
[3] "well she brought a pie and she brought some er punch round"         # match    
[4] "so your dad said well have n't i been soft ?"                       # not match         
[5] "And he went [pause] well I can't feel any. "                        # not match             
[6] "I goes well they'll improve the grant to start off with"            # not match         
[7] "so with the chips as well this is about one sixty ."                # match      
[8] "well we 're not all the same are we , but"                          # match

grep（（？）？
提前感谢！
这是因为（？匹配字符串中的一个位置，该位置不是紧跟在后面查找中定义的字符串前面的。*
然后尽可能多地匹配除换行符以外的任何0+字符，然后匹配以及。有很多这样的有效位置
最简单的方法是匹配那些出现在well
之前的said
、goes
或goes
字符串并跳过它们，然后在所有其他上下文中匹配well
：
\b(?:said|goes|went)\b.*\bwell\b(*SKIP)(*F)|\bwell\b

看
注意：如果您使用类似^（？。*\b（？：said | goes | goed）\b.*\b well\b
，当said
，goes
或goes
出现在well
之后时，您可能会得到假阴性
图案细节

\b（？：said | go | go）\b.*\b well\b（*SKIP）（*F）
-一个完整的单词：said
，go
或go
，然后尽可能多的任何0个字符，然后是一个完整的单词井
，在找到匹配后，它被丢弃，正则表达式引擎开始在当前失败的位置寻找匹配项
|
-或
\bwell\b
-一个完整的单词井

见：
你不是只想在积极的回顾中反转你的匹配吗？grep（…，invert=TRUE）我对这个选项很熟悉，但在帖子中我特别感兴趣的是深入挖掘消极的回顾。不过还是谢谢你。
grep("\\b(?:said|goes|went)\\b.*\\bwell\\b(*SKIP)(*F)|\\bwell\\b", w, value = TRUE, perl = TRUE)
# [1] "we got fifteen, well thirteen minutes"                     
# [2] "well she brought a pie and she brought some er punch round"
# [3] "so with the chips as well this is about one sixty ."       
# [4] "well we 're not all the same are we , but"