R 将一个数据帧中的每一行与另一个数据帧中的多行进行比较,并获得结果
我有两个数据集df1和df2R 将一个数据帧中的每一行与另一个数据帧中的多行进行比较,并获得结果,r,dataframe,R,Dataframe,我有两个数据集df1和df2 df1 c1 match c3 c4 AA1 AB cat dog AA1 CD dfs abd AA1 EF js hn AA1 GH bsk jtd AA2 AB cat mouse AA2 CD adb mop AA2 EF powas qwert AA2 GH sms mms AA3 AB i
df1
c1 match c3 c4
AA1 AB cat dog
AA1 CD dfs abd
AA1 EF js hn
AA1 GH bsk jtd
AA2 AB cat mouse
AA2 CD adb mop
AA2 EF powas qwert
AA2 GH sms mms
AA3 AB i j
AA3 CD fgh ejk
AA3 EF mib loi
AA3 GH revit roger
df2
match d2 result
AB cat friendly
AB mouse enemy
CD dfs r1
CD adb r1
CD fgh r2
CD ejk r3
EF mib some_result
GH sms sent
GH mms sent
IJ xxx yyy
KL crt zzz
KL rrr qqq
我想通过列“match”匹配df1和df2,并在df1中添加两个新列“result_c1”和“result_c2”。结果_c1通过首先匹配匹配列,然后将df1中的c3匹配到df2中的d2,从而从df2中获得相应的结果。结果_c2通过首先匹配匹配列,然后将df1中的c4匹配到df2中的d2,从而从df2中获得相应的结果。如果没有匹配项,则返回“no_match”。有没有一种有效的方法可以做到这一点
result
c1 match c3 c4 result_c1 result_c2
AA1 AB cat dog friendly no_match
AA1 CD dfs adb r1 r1
AA1 EF js hn no_match no_match
AA1 GH bsk jtd no_match no_match
AA2 AB cat mouse friendly enemy
AA2 CD adb mop r1 no_match
AA2 EF powas qwert no_match no_match
AA2 GH sms mms sent sent
AA3 AB i j no_match no_match
AA3 CD fgh ejk r2 r3
AA3 EF mib loi some_result no_match
AA3 GH revit roger no_match no_match
数据附于下文:
df1 <- data.frame(list(c1 = c("AA1", "AA1", "AA1", "AA1", "AA2", "AA2", "AA2", "AA2",
"AA3", "AA3", "AA3", "AA3"), match = c("AB", "CD", "EF", "GH",
"AB", "CD", "EF", "GH",
"AB", "CD", "EF", "GH"),
c3 = c("cat", "dfs", "js", "bsk", "cat", "adb", "powas", "sms", "i",
"fgh", "mib", "revit"), c4 = c("dog", "abd", "hn", "jtd", "mouse",
"mop", "qwert", "mms", "j", "ejk", "loi", "roger")))
df2 <- data.frame(list(match = c("AB", "AB", "CD", "CD", "CD", "CD", "EF", "GH", "GH", "IJ", "KL", "KL"),
d2 = c("cat", "mouse", "dfs", "adb", "fgh", "ejk", "mib", "sms", "mms", "xxx", "crt", "rrr"),
result = c("friendly", "enemy", "r1", "r1", "r2", "r3", "some_result", "sent", "sent", "yyy", "zzz", "qqq")))
df1单向使用自定义函数
apply_fun <- function(x, y, r) {
inds <- x %in% y
if (any(inds)) r[match(x[which.max(inds)], y)] else "no_match"
}
library(dplyr)
df1 %>%
left_join(df2, by = "match") %>%
mutate_all(as.character) %>%
group_by(c1, match) %>%
summarise(result_c1 = apply_fun(c3, d2, result),
result_c2 = apply_fun(c4, d2, result))
# c1 match result_c1 result_c2
# <chr> <chr> <chr> <chr>
# 1 AA1 AB friendly no_match
# 2 AA1 CD r1 no_match
# 3 AA1 EF no_match no_match
# 4 AA1 GH no_match no_match
# 5 AA2 AB friendly enemy
# 6 AA2 CD r1 no_match
# 7 AA2 EF no_match no_match
# 8 AA2 GH sent sent
# 9 AA3 AB no_match no_match
#10 AA3 CD r2 r3
#11 AA3 EF some_result no_match
#12 AA3 GH no_match no_match
apply\u fun%
全部变异(如字符)%>%
分组依据(c1,匹配)%>%
总结(结果c1=应用乐趣(c3、d2、结果),
结果c2=应用乐趣(c4、d2、结果))
#c1匹配结果\u c1结果\u c2
#
#1 AA1 AB友谊赛无比赛
#2 AA1 CD r1不匹配
#3 AA1 EF不匹配不匹配
#4 AA1 GH不匹配不匹配
#5 AA2 AB友好敌人
#6 AA2 CD r1不匹配
#7 AA2 EF不匹配不匹配
#8 AA2 GH已发送
#9 AA3 AB不匹配不匹配
#10 AA3 CD r2 r3
#11 AA3 EF某些结果不匹配
#12 AA3 GH不匹配不匹配
这里是一个使用基本R
的解决方案:
df1$result_c1 = with(df1,ifelse(is.na(match(paste(match,c3),with(df2,paste(match,d2)))),
"no match",
as.character(df2$result[match(paste(match,c3),with(df2,paste(match,d2)))])))
df1$result_c2 = with(df1,ifelse(is.na(match(paste(match,c4),with(df2,paste(match,d2)))),
"no match",
as.character(df2$result[match(paste(match,c4),with(df2,paste(match,d2)))])))
以致
> df1
c1 match c3 c4 result_c1 result_c2
1 AA1 AB cat dog friendly no match
2 AA1 CD dfs abd r1 r1
3 AA1 EF js hn no match no match
4 AA1 GH bsk jtd no match no match
5 AA2 AB cat mouse friendly enemy
6 AA2 CD adb mop no match no match
7 AA2 EF powas qwert no match no match
8 AA2 GH sms mms sent sent
9 AA3 AB i j no match no match
10 AA3 CD fgh ejk r2 r3
11 AA3 EF mib loi some_result no match
12 AA3 GH revit roger no match no match
选择此选项作为正确的解决方案,因为它比其他答案更快。