R 按列的精确匹配合并数据帧
我想合并两个数据帧,其中一个有更多的变量(列),而另一个有更多的观察(行)。下面是如何设置它们的简化示例: 数据帧1:R 按列的精确匹配合并数据帧,r,dataframe,merge,conditional,match,R,Dataframe,Merge,Conditional,Match,我想合并两个数据帧,其中一个有更多的变量(列),而另一个有更多的观察(行)。下面是如何设置它们的简化示例: 数据帧1: ID Date Indicator 12345 01/01/2008 1 54321 12/01/2008 1 数据帧2: ID Date 12345 01/01/2008 12345 01/31/2008 12345 02/28/2009 24681 01/01/2008 54321
ID Date Indicator
12345 01/01/2008 1
54321 12/01/2008 1
数据帧2:
ID Date
12345 01/01/2008
12345 01/31/2008
12345 02/28/2009
24681 01/01/2008
54321 12/01/2008
54321 12/20/2008
我想做的是只保留ID完全匹配的行。例如,我想要以下输出:
新数据帧:
ID Date Indicator
12345 01/01/2008 1
12345 01/31/2008 NA
12345 02/28/2009 NA
54321 12/01/2008 1
54321 12/20/2008 NA
我试过了
new <- merge(df1, df2, all=TRUE)
new您可以尝试使用dplyr
解决方案:
library(dplyr)
# a right join when you filter Dataframe2 by ID in Dataframe1
Dataframe1 %>% right_join(Dataframe2[Dataframe2$ID %in% Dataframe1$ID,])
Joining, by = c("ID", "Date")
ID Date Indicator
1 12345 01/01/2008 1
2 12345 01/31/2008 NA
3 12345 02/28/2009 NA
4 54321 12/01/2008 1
5 54321 12/20/2008 NA
# clearly you can put it in a data.frame
Dataframe3 <- Dataframe1 %>% right_join(Dataframe2[Dataframe2$ID %in% Dataframe1$ID,], by = 'ID') %>%
data.frame()
根据s_t的评论编辑:
left_join(df2, df1, by=c("ID", "Date")) %>% filter(ID %in% df1$ID)
考虑将与子集合并
:
df3 <- subset(merge(df1, df2, by=c("ID", "Date"), all=TRUE), ID %in% df1$ID)
df3
# ID Date Indicator
# 1 12345 01/01/2008 1
# 2 12345 01/31/2008 NA
# 3 12345 02/28/2009 NA
# 5 54321 12/01/2008 1
# 6 54321 12/20/2008 NA
试试看:
library(dplyr)
df2 %>%
left_join(df1, by = c("ID", "Date")) %>% # or full_join(df1, by = c("ID", "Date"))
filter(ID %in% df1$ID)
或者根据您开始的内容:
merge(df1, df2, all = TRUE) %>% filter(ID %in% df1$ID)
您可以尝试使用ply库中的函数join()。您还需要额外的步骤来获得所需的精确输出
library(plyr)
df1
ID Date Indicator
1 12345 2020-01-01 1
2 54321 2020-12-01 1
df2
ID Date
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
4 24681 2020-01-01
5 54321 2020-12-01
6 54321 2020-12-20
# that extra step
df3 <- df2[df2$ID %in% df1$ID,]
df3
ID Date
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
5 54321 2020-12-01
6 54321 2020-12-20
join(df3, df1, by = c("ID", "Date"))
ID Date Indicator
1 12345 2020-01-01 1
2 12345 2020-01-31 NA
3 12345 2020-02-28 NA
4 54321 2020-12-01 1
5 54321 2020-12-20 NA
库(plyr)
df1
ID日期指示器
1 12345 2020-01-01 1
2 54321 2020-12-01 1
df2
身份证日期
1 12345 2020-01-01
2 12345 2020-01-31
3 12345 2020-02-28
4 24681 2020-01-01
5 54321 2020-12-01
6 54321 2020-12-20
#额外的步骤
df3如果数据大小不太大,可以添加一行,按df1$id过滤结果
new <- new[new$id %in% unique(df1$id),]
new您正在寻找一个连接。如果计划保留为引用的表位于左侧,则其为左联接。示例代码
df1<-data.frame(ID=c(12345,54321) ,Date =c('01/01/2008',' 12/01/2008 ') ,
Indicator=c(1,1))
df2<-data.frame(ID=c(12345,12345,5341) ,Date =c('01/01/2008',' 12/01/2008
','12/1/2008') )
merge(df1,df2,by.x = 'ID',by.y='ID')
ID Date.x Indicator Date.y
12345 01/01/2008 1 01/01/2008
12345 01/01/2008 1 12/01/2008
df1merge(df1,df2,by=“ID”,all=TRUE)
根据这个@Ryan抱歉,我更改了它。感谢you@foc,OP似乎不需要ID 24681,而且加入似乎也在日期上。@s\u谢谢。我没有看到结果。尝试学习回答。我已经尝试过了,但是会给这个ID日期指示器1 12345 01/01/2008 12 54321 12/01/2008 1
,这似乎不是OP需要的。Hi@sh2这有助于你解决问题还是你需要其他帮助?这似乎没有给出期望的结果,因为它是ID日期指示器1234501/01/2008 12/01/2008 1 54321 1
。是的,你说得对!我误读了任务left\u join(df2,df1,by=c(“ID”,“Date”))%>%filter(ID%在%df1$ID中)
是正确的。这和你的建议很相似。
new <- new[new$id %in% unique(df1$id),]
df1<-data.frame(ID=c(12345,54321) ,Date =c('01/01/2008',' 12/01/2008 ') ,
Indicator=c(1,1))
df2<-data.frame(ID=c(12345,12345,5341) ,Date =c('01/01/2008',' 12/01/2008
','12/1/2008') )
merge(df1,df2,by.x = 'ID',by.y='ID')
ID Date.x Indicator Date.y
12345 01/01/2008 1 01/01/2008
12345 01/01/2008 1 12/01/2008