R 使用正则表达式匹配字符串,但忽略(不排除)包含匹配字符串的某些短语
我试图找到一种方法来匹配数据集中的某些字符串,但忽略(而不是排除)一些包含匹配项的表达式R 使用正则表达式匹配字符串,但忽略(不排除)包含匹配字符串的某些短语,r,regex,R,Regex,我试图找到一种方法来匹配数据集中的某些字符串,但忽略(而不是排除)一些包含匹配项的表达式 clin_pres <- c("Patient A received yellow fever vaccine, and had a fever", "Patient B received the yellow fever vaccine but had no fever", "Patient C returned from Bali yesterday and now has a fever", "
clin_pres <- c("Patient A received yellow fever vaccine, and had a fever", "Patient B received the yellow fever vaccine but had no fever", "Patient C returned from Bali yesterday and now has a fever", "Patient D had no fever last week but now has a fever")
哪些产出:
[1] 假假真假
但我只想忽略“黄热病疫苗”和“无热病”作为匹配项,而不是在匹配时排除它们,以获得输出:
[1] 真假真真
有什么帮助或建议吗?有两种可能的正则表达式解决方案:
grepl("\\b(?<!\\bno )fever\\b(?<!\\byellow fever(?= vaccine))",clin_pres, ignore.case = TRUE, perl=TRUE)
看
第一个-\\b(?-regex匹配
\b
-单词边界
(?-前面不允许有“no”
- 发烧
-一个词
\b
-单词边界(?-“发烧”前没有“黄色”,发烧后没有“疫苗”
-(?:\b(?:no\s+发烧| yellow\s+发烧\s+疫苗)\b)
或无发烧
作为一个完整的单词,单词之间有任何1+空格黄热病疫苗
-跳过当前位置的匹配项,然后继续从中搜索匹配项(*SKIP)(*F)
-或|
-一整句话\bfever\b
发烧
请参阅删除不需要的字符串,然后grep for fever:
grepl("fever", gsub("yellow fever vaccine|no fever", "", clin_pres))
## [1] TRUE FALSE TRUE TRUE
使用的问题是
ignore.case=TRUE
,但这不是必需的,因为输入都是小写。如果另一个问题中有大写字母,只需将clin\u pres
替换为tolower(clin\u pres)
或将ignore.case=TRUE
添加到grepl
和gsub
我认为这不应该被否决。它回答了问题,给出了正确的答案,很容易概括,比任何其他给出的答案都简单。
grepl("(?:\\b(?:no\\s+fever|yellow\\s+fever\\s+vaccine)\\b)(*SKIP)(*F)|\\bfever\\b",clin_pres, ignore.case = TRUE, perl=TRUE)
grepl("fever", gsub("yellow fever vaccine|no fever", "", clin_pres))
## [1] TRUE FALSE TRUE TRUE