Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/opengl/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 带逗号分隔项的正则表达式_R_Regex - Fatal编程技术网

R 带逗号分隔项的正则表达式

R 带逗号分隔项的正则表达式,r,regex,R,Regex,我有以下数据集 df <- data.frame(id = c(1,2,3), names = c( "Adam Jones, John David, Maddy Kones", "Adam Smith, Maddy Kones, John David", "Maddy Kones, John Peterson, Adam Smith")) 我不知道如何使用正则表达式。我已经试过了 output <- df [grep("Adam" [^,]* "John", df$names)

我有以下数据集

df <- data.frame(id = c(1,2,3), names = c( "Adam Jones, John David, Maddy Kones", 
"Adam Smith, Maddy Kones, John David", "Maddy Kones, John Peterson, Adam Smith"))
我不知道如何使用正则表达式。我已经试过了

output <- df [grep("Adam" [^,]* "John", df$names),]

output这里的一个基本方法是使用具有适当模式的
grepl

Adam\b[^,]*,\\s*John.*
这表示匹配
Adam
,后跟单词边界和任何直到第一个逗号的内容,紧接着是
John
,作为下一个术语。我们没有任何难看的边缘案例,因为如果约翰必须跟随亚当,这意味着这两个名字之间总会有一个逗号分隔

代码:

df[grepl("Adam\\b[^,]*,\\s*John.*", df$names), ]

更新

当缺少“Adam”或“John”时,原始解决方案不会给出预期的答案。例如,对于此数据帧

df
#  id                                  names
#1  1    Adam Jones, John David, Maddy Kones
#2  2    Adam Smith, Maddy Kones, John David
#3  3 Maddy Kones, John Peterson, Adam Smith
#4  4                 Adam Smith, Ronak Shah 
使用原始解决方案,我们将得到如下输出:

#   id                               names
#1   1 Adam Jones, John David, Maddy Kones
#NA NA                                <NA>
原始答案

另一种选择是,在
上拆分所有
名称
,并使用
grep
检查“John”和“Adam”出现的位置,仅当它们之间的差值为1时选择(因为“John”跟在“Adam”后面)

#   id                               names
#1   1 Adam Jones, John David, Maddy Kones
#NA NA                                <NA>
df[sapply(strsplit(df$names, ","), function(x) 
       isTRUE(grep("John", x) - grep("Adam", x) == 1)), ]

#  id                               names
#1  1 Adam Jones, John David, Maddy Kones
df[sapply(strsplit(df$names, ","), function(x) 
                      grep("John", x) - grep("Adam", x)) == 1, ]

#id                               names
#1  1 Adam Jones, John David, Maddy Kones