R-基于多个条件匹配来自2个数据帧的值（当查找ID的顺序是随机的时）_R_Data Manipulation_Data Cleaning

R-基于多个条件匹配来自2个数据帧的值（当查找ID的顺序是随机的时）

R-基于多个条件匹配来自2个数据帧的值（当查找ID的顺序是随机的时）,r,data-manipulation,data-cleaning,R,Data Manipulation,Data Cleaning,嗨，我有两个数据帧： df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11), Played_together = c(1,0,0,1,1,0,0,0,1,0,1), Event=c(1,1,1,1,2,2,2,2,2,2,2), Utility=c(20,-2,-5,10,30

嗨，我有两个数据帧：

df1 = data.frame(PersonId1=c(1,2,3,4,5,6,7,8,9,10,1),PersonId2=c(11,12,13,14,15,16,17,18,19,20,11),
             Played_together = c(1,0,0,1,1,0,0,0,1,0,1),
             Event=c(1,1,1,1,2,2,2,2,2,2,2),
             Utility=c(20,-2,-5,10,30,2,1,.5,50,-1,60))


df2 = data.frame(PersonId1=c(11,15,9,1),PersonId2=c(1,5,19,11),
             Played_together = c(1,1,1,1),
             Event=c(1,2,2,2))

其中，df1如下所示：

      PersonId1 PersonId2 Played_together Event Utility
1          1        11               1     1    20.0
2          2        12               0     1    -2.0
3          3        13               0     1    -5.0
4          4        14               1     1    10.0
5          5        15               1     2    30.0
6          6        16               0     2     2.0
7          7        17               0     2     1.0
8          8        18               0     2     0.5
9          9        19               1     2    50.0
10        10        20               0     2    -1.0
11         1        11               1     2    60.0

  PersonId1 PersonId2 Played_together Event
1        11         1               1     1
2        15         5               1     2
3         9        19               1     2
4         1        11               1     2

  PersonId1 PersonId2 Played_together Event Utility
1        11         1               1     1      20
2        15         5               1     2      30
3         9        19               1     2      50
4         1        11               1     2      60

df2看起来是这样的：

      PersonId1 PersonId2 Played_together Event Utility
1          1        11               1     1    20.0
2          2        12               0     1    -2.0
3          3        13               0     1    -5.0
4          4        14               1     1    10.0
5          5        15               1     2    30.0
6          6        16               0     2     2.0
7          7        17               0     2     1.0
8          8        18               0     2     0.5
9          9        19               1     2    50.0
10        10        20               0     2    -1.0
11         1        11               1     2    60.0

  PersonId1 PersonId2 Played_together Event
1        11         1               1     1
2        15         5               1     2
3         9        19               1     2
4         1        11               1     2

  PersonId1 PersonId2 Played_together Event Utility
1        11         1               1     1      20
2        15         5               1     2      30
3         9        19               1     2      50
4         1        11               1     2      60

请注意，df2并不是简单地df1$一起玩==1。（对于eg PlayerId1=4和PlayerId2=14，df2中不存在

还要注意，虽然df2是df1的子集，但个体在df2中出现的顺序是随机的。例如，在第1行的df1中，我们看到事件1的playerid1=1和playerId2=11。但是在第1行的df2中，我们看到事件1的playerid1=11和playerId2=1。这两种情况完全相同，我想查找一下实用程序的值从df1到df2。必须对每个事件进行合并。最终输出应如下所示：

PersonId1 PersonId2 Played_together Event Utility 1 1 11 1 1 20.0 2 2 12 0 1 -2.0 3 3 13 0 1 -5.0 4 4 14 1 1 10.0 5 5 15 1 2 30.0 6 6 16 0 2 2.0 7 7 17 0 2 1.0 8 8 18 0 2 0.5 9 9 19 1 2 50.0 10 10 20 0 2 -1.0 11 1 11 1 2 60.0

PersonId1 PersonId2 Played_together Event 1 11 1 1 1 2 15 5 1 2 3 9 19 1 2 4 1 11 1 2

PersonId1 PersonId2 Played_together Event Utility 1 11 1 1 1 20 2 15 5 1 2 30 3 9 19 1 2 50 4 1 11 1 2 60

我知道R中存在合并函数，但我不知道当查找ID显示为随机时该怎么办。如果有人能帮我一点忙，我将不胜感激。提前感谢。
以下是我为您准备的：

library(dplyr) rbind(left_join(df2, df1, by = c("PersonId2" = "PersonId1", "PersonId1" = "PersonId2", "Played_together" = "Played_together", "Event" = "Event")), left_join(df2, df1, by = c("PersonId1" = "PersonId1", "PersonId2" = "PersonId2", "Played_together" = "Played_together", "Event" = "Event"))) %>% filter(!is.na(Utility))
基本上，您的数据有时会翻转personid。我们可以将两个连接绑定在一起，然后过滤掉那些具有实用程序
NA
的行
您的输出如下所示：

PersonId1 PersonId2 Played_together Event Utility 1 11 1 1 1 20 2 15 5 1 2 30 3 9 19 1 2 50 4 1 11 1 2 60
一个解决方案是使用
PersonId1
和
PersonId2
的组合创建一个“团队”列，这样它可以为两个团队创建
min（PersonId）：max（PersonId）
。现在，加入
Team
和
Event
上的
df1
和
df2
以获得所需的数据

library(dplyr) df2 %>% rowwise() %>% mutate(Team = paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))) %>% inner_join(df1 %>% rowwise() %>% mutate(Team = paste0(min(PersonId1,PersonId2), ":",max(PersonId1,PersonId2))), by = c("Team", "Event")) %>% select(PersonId1 = PersonId1.x, PersonId2 = PersonId2.x, Played_together = Played_together.x, Event, Utility) %>% as.data.frame() # PersonId1 PersonId2 Played_together Event Utility # 1 11 1 1 1 20 # 2 15 5 1 2 30 # 3 9 19 1 2 50 # 4 1 11 1 2 60

@Adam Warner非常感谢你的回答。效果非常好。只是一个新手快速跟进问题-在你的代码中，哪一部分负责反向personid？@Prometheus只是一种变通方法，但是你可以看到第一个左键连接我指定hey match personid2为personid1，然后我绑定另一个未反向的连接。因此如果你o不要过滤掉NAs。在personid1没有反转的情况下，你会得到NA的实用值。明白了。出于某种原因，我漏掉了Personid2=personid1。这非常有用。谢谢