带多词分离的R中的负向查找

带多词分离的R中的负向查找,r,regex,lookbehind,R,Regex,Lookbehind,我使用R来进行一些字符串处理,并希望识别具有特定词根的字符串,这些字符串前面没有特定词根的另一个单词 下面是一个简单的玩具示例。假设我想识别在字符串中任何位置前面没有“dog/s”的单词“cat/s”的字符串 tests = c( "dog cat", "dogs and cats", "dog and cat", "dog and fluffy cats", "cats and dogs", "cat and dog", "fluffy ca

我使用R来进行一些字符串处理,并希望识别具有特定词根的字符串,这些字符串前面没有特定词根的另一个单词

下面是一个简单的玩具示例。假设我想识别在字符串中任何位置前面没有“dog/s”的单词“cat/s”的字符串

 tests = c(
   "dog cat",
   "dogs and cats",
   "dog and cat", 
   "dog and fluffy cats",
   "cats and dogs", 
   "cat and dog",  
   "fluffy cats and fluffy dogs")  
使用这种模式,我可以先拉狗后拉猫的绳子:

 pattern = "(dog(s|).*)(cat(s|))"
 grep(pattern, tests, perl = TRUE, value = TRUE)

[1] "dog cat"  "dogs and cats"   "dog and cat"   "dog and fluffy cats"
我的消极落后是有问题的:

 neg_pattern = "(?<!dog(s|).*)(cat(s|))"
 grep(neg_pattern, tests, perl = TRUE, value = TRUE)

neg_pattern=“(?我希望这有助于:

tests = c(
  "dog cat",
  "dogs and cats",
  "dog and cat", 
  "dog and fluffy cats",
  "cats and dogs", 
  "cat and dog",  
  "fluffy cats and fluffy dogs"
)

# remove strings that have cats after dogs
tests = tests[-grep(pattern = "dog(?:s|).*cat(?:s|)", x = tests)]

# select only strings that contain cats
tests = tests[grep(pattern = "cat(?:s|)", x = tests)]

tests

[1] "cats and dogs"               "cat and dog"                
[3] "fluffy cats and fluffy dogs"
我不确定你是否想用一个表情来表达,但是
在迭代应用时,正则表达式仍然非常有用。

我希望这能有所帮助:

tests = c(
  "dog cat",
  "dogs and cats",
  "dog and cat", 
  "dog and fluffy cats",
  "cats and dogs", 
  "cat and dog",  
  "fluffy cats and fluffy dogs"
)

# remove strings that have cats after dogs
tests = tests[-grep(pattern = "dog(?:s|).*cat(?:s|)", x = tests)]

# select only strings that contain cats
tests = tests[grep(pattern = "cat(?:s|)", x = tests)]

tests

[1] "cats and dogs"               "cat and dog"                
[3] "fluffy cats and fluffy dogs"
我不确定你是否想用一个表情来表达,但是
在迭代应用时,正则表达式仍然非常有用。

是的,您的“消极前瞻有问题”,因为它不是一个前瞻,它是一个不能有未知长度模式的前瞻。看起来您可以这样使用前瞻-
“^(?。*dog.*cat)。*cat“
看,在R中的一个正则表达式中,你似乎做不到你想做的事情。这里还有一个同样的问题,有一个很好的答案:@WiktorStribiżew我正在试图理解我问题的词根成分。例如,cats vs cat vs caterpillar…我可以使用cat(s | erpillar |)等等。然后永远不要过于简单化。发布真实场景问题的详细信息。清醒的人肯定会帮助你。是的,你的“消极前瞻有问题”,因为它不是前瞻,它是一个不能有未知长度模式的前瞻。看起来你可以这样使用前瞻-
“^(?!*dog.*cat)。*cat”
看,在R中的一个正则表达式中,你似乎不能做你想做的事情。这里还有一个同样的问题,有一个很好的答案:@WiktorStribiżew我正在试图理解我问题的词根成分。例如,猫vs猫vs毛虫…我可以使用cat(s | erpillar |)等等。然后不要过于简单化。发布真实场景问题的细节。醒着的人肯定会帮助你。