R 按列的精确匹配合并数据帧

R 按列的精确匹配合并数据帧,r,dataframe,merge,conditional,match,R,Dataframe,Merge,Conditional,Match,我想合并两个数据帧,其中一个有更多的变量(列),而另一个有更多的观察(行)。下面是如何设置它们的简化示例: 数据帧1: ID Date Indicator 12345 01/01/2008 1 54321 12/01/2008 1 数据帧2: ID Date 12345 01/01/2008 12345 01/31/2008 12345 02/28/2009 24681 01/01/2008 54321

我想合并两个数据帧,其中一个有更多的变量(列),而另一个有更多的观察(行)。下面是如何设置它们的简化示例:

数据帧1:

ID      Date         Indicator
12345   01/01/2008   1
54321   12/01/2008   1
数据帧2:

ID      Date         
12345   01/01/2008   
12345   01/31/2008
12345   02/28/2009
24681   01/01/2008
54321   12/01/2008
54321   12/20/2008
我想做的是只保留ID完全匹配的行。例如,我想要以下输出:

新数据帧:

ID      Date         Indicator     
12345   01/01/2008   1
12345   01/31/2008   NA
12345   02/28/2009   NA
54321   12/01/2008   1
54321   12/20/2008   NA
我试过了

new <- merge(df1, df2, all=TRUE)

new您可以尝试使用
dplyr
解决方案:

library(dplyr)
# a right join when you filter Dataframe2 by ID in Dataframe1
Dataframe1 %>% right_join(Dataframe2[Dataframe2$ID %in% Dataframe1$ID,])  

Joining, by = c("ID", "Date")
     ID       Date Indicator
1 12345 01/01/2008         1
2 12345 01/31/2008        NA
3 12345 02/28/2009        NA
4 54321 12/01/2008         1
5 54321 12/20/2008        NA

# clearly you can put it in a data.frame
Dataframe3 <- Dataframe1 %>% right_join(Dataframe2[Dataframe2$ID %in% Dataframe1$ID,], by = 'ID') %>% 
             data.frame()

根据s_t的评论编辑:

left_join(df2, df1, by=c("ID", "Date")) %>% filter(ID %in% df1$ID)

考虑将
子集合并

df3 <- subset(merge(df1, df2, by=c("ID", "Date"), all=TRUE), ID %in% df1$ID)

df3
#      ID       Date Indicator
# 1 12345 01/01/2008         1
# 2 12345 01/31/2008        NA
# 3 12345 02/28/2009        NA
# 5 54321 12/01/2008         1
# 6 54321 12/20/2008        NA
试试看:

library(dplyr)
df2 %>%
  left_join(df1, by = c("ID", "Date")) %>% # or full_join(df1, by = c("ID", "Date"))
  filter(ID %in% df1$ID) 
或者根据您开始的内容:

merge(df1, df2, all = TRUE) %>% filter(ID %in% df1$ID)

您可以尝试使用ply库中的函数join()。您还需要额外的步骤来获得所需的精确输出

library(plyr)

df1

     ID       Date Indicator
1 12345 2020-01-01         1
2 54321 2020-12-01         1

 df2

     ID       Date
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
4 24681 2020-01-01
5 54321 2020-12-01
6 54321 2020-12-20

# that extra step
df3 <- df2[df2$ID %in% df1$ID,]
df3
     ID       Date
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
5 54321 2020-12-01
6 54321 2020-12-20

join(df3, df1, by = c("ID", "Date"))
     ID       Date Indicator
1 12345 2020-01-01         1
2 12345 2020-01-31        NA
3 12345 2020-02-28        NA
4 54321 2020-12-01         1
5 54321 2020-12-20        NA
库(plyr)
df1
ID日期指示器
1 12345 2020-01-01         1
2 54321 2020-12-01         1
df2
身份证日期
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
4 24681 2020-01-01
5 54321 2020-12-01
6 54321 2020-12-20
#额外的步骤

df3如果数据大小不太大,可以添加一行,按df1$id过滤结果

new <- new[new$id %in% unique(df1$id),]

new您正在寻找一个连接。如果计划保留为引用的表位于左侧,则其为左联接。示例代码

    df1<-data.frame(ID=c(12345,54321) ,Date  =c('01/01/2008',' 12/01/2008 ')   ,    
     Indicator=c(1,1))

     df2<-data.frame(ID=c(12345,12345,5341) ,Date  =c('01/01/2008',' 12/01/2008 
      ','12/1/2008') )

    merge(df1,df2,by.x = 'ID',by.y='ID')

      ID     Date.x       Indicator       Date.y
      12345 01/01/2008         1    01/01/2008
      12345 01/01/2008         1    12/01/2008 

df1
merge(df1,df2,by=“ID”,all=TRUE)
根据这个@Ryan抱歉,我更改了它。感谢you@foc,OP似乎不需要ID 24681,而且加入似乎也在日期上。@s\u谢谢。我没有看到结果。尝试学习回答。我已经尝试过了,但是会给这个
ID日期指示器1 12345 01/01/2008 12 54321 12/01/2008 1
,这似乎不是OP需要的。Hi@sh2这有助于你解决问题还是你需要其他帮助?这似乎没有给出期望的结果,因为它是
ID日期指示器1234501/01/2008 12/01/2008 1 54321 1
。是的,你说得对!我误读了任务
left\u join(df2,df1,by=c(“ID”,“Date”))%>%filter(ID%在%df1$ID中)
是正确的。这和你的建议很相似。
new <- new[new$id %in% unique(df1$id),]
    df1<-data.frame(ID=c(12345,54321) ,Date  =c('01/01/2008',' 12/01/2008 ')   ,    
     Indicator=c(1,1))

     df2<-data.frame(ID=c(12345,12345,5341) ,Date  =c('01/01/2008',' 12/01/2008 
      ','12/1/2008') )

    merge(df1,df2,by.x = 'ID',by.y='ID')

      ID     Date.x       Indicator       Date.y
      12345 01/01/2008         1    01/01/2008
      12345 01/01/2008         1    12/01/2008