在R中合并具有不同大小和条件的数据帧
我正在尝试将2个csv文件合并为一个文件。它们有不同大小的公共id。我使用了merge(),但得到了复制的数据。我有以下数据帧在R中合并具有不同大小和条件的数据帧,r,R,我正在尝试将2个csv文件合并为一个文件。它们有不同大小的公共id。我使用了merge(),但得到了复制的数据。我有以下数据帧 SR <- c("SR1", "SR2", "SR2", "SR2", "SR3", "SR4", "SR4") school <- c("S-1", "S-1", "S
SR <- c("SR1", "SR2", "SR2", "SR2", "SR3", "SR4", "SR4")
school <- c("S-1", "S-1", "S-2", "S-4", "S-2", "S-1", "S-5")
Y <- c(3,4,1,2,5,2,3)
data1 <- data.frame(SR.id, school, Y)
SR <- c("SR1", "SR1", "SR1", "SR2", "SR2", "SR2", "SR2", "SR2", "SR2", "SR2", "SR3", "SR3", "SR4", "SR4", "SR4")
class <- c("S-1.02", "S-1.05", "S-1.07", "S-1.01", "S-1.02", "S-1.03", "S-1.06", "S-2.03", "S-2.15", "S-4.02", "S-2.01", "S-2.03", "S-1.05", "S-1.06", "S-5.01")
data2 <- data.frame(SR, class)
学校在哪里,结果应该是一样的
SR school class Y
SR1 S-1 S-1.02 3
SR1 S-1 S-1.05 3
SR1 S-1 S-1.07 3
SR2 S-1 S-1.01 4
SR2 S-1 S-1.02 4
SR2 S-1 S-1.03 4
SR2 S-1 S-1.06 4
SR2 S-2 S-2.03 1
SR2 S-2 S-2.15 1
SR2 S-4 S-4.02 2
SR3 S-2 S-2.01 5
SR3 S-2 S-2.03 5
SR4 S-1 S-1.05 2
SR4 S-1 S-1.06 2
SR4 S-5 S-5.01 3
谢谢您的帮助。您能否编辑您的问题,并使用
dput
将您的两个df放入一个更便于我们获取的表格中
话虽如此,你还是需要做一些类似的事情
# NOT RUN
library(tidyverse)
RESULT <- data2 %>%
mutate(comparison.id = str_detect(outcome.id, "^.+\\d+")) %>%
inner_join(data1, by = c("SR.id", "comparison.id"))
#未运行
图书馆(tidyverse)
结果%
突变(comparison.id=str_detect(output.id,“^.+\\d+”))%>%
内部连接(data1,by=c(“SR.id”,“comparison.id”))
一个选项是regex\u left\u join
fromfuzzyjoin
library(fuzzyjoin)
library(dplyr)
regex_left_join(data2, data1, by = c("SR", "class" = "school")) %>%
select(SR = SR.x, school, class, Y)
# SR school class Y
# 1 SR1 S-1 S-1.2 3
# 2 SR1 S-1 S-1.5 3
# 3 SR1 S-1 S-1.7 3
# 4 SR2 S-1 S-1.1 4
# 5 SR2 S-1 S-1.2 4
# 6 SR2 S-1 S-1.3 4
# 7 SR2 S-1 S-1.6 4
# 8 SR2 S-2 S-2.3 1
# 9 SR2 S-2 S-2.9 1
# 10 SR2 S-4 S-4.2 2
# 11 SR3 S-2 S-2.1 5
# 12 SR3 S-2 S-2.3 5
# 13 SR4 S-1 S-1.5 2
# 14 SR4 S-1 S-1.6 2
# 15 SR4 S-5 S-5.1 3
我编辑了这个问题,您现在就可以使用它了,非常感谢,我将在大数据上试用。@Me28请确保每个数据集中的
by
变量都属于同一类
library(fuzzyjoin)
library(dplyr)
regex_left_join(data2, data1, by = c("SR", "class" = "school")) %>%
select(SR = SR.x, school, class, Y)
# SR school class Y
# 1 SR1 S-1 S-1.2 3
# 2 SR1 S-1 S-1.5 3
# 3 SR1 S-1 S-1.7 3
# 4 SR2 S-1 S-1.1 4
# 5 SR2 S-1 S-1.2 4
# 6 SR2 S-1 S-1.3 4
# 7 SR2 S-1 S-1.6 4
# 8 SR2 S-2 S-2.3 1
# 9 SR2 S-2 S-2.9 1
# 10 SR2 S-4 S-4.2 2
# 11 SR3 S-2 S-2.1 5
# 12 SR3 S-2 S-2.3 5
# 13 SR4 S-1 S-1.5 2
# 14 SR4 S-1 S-1.6 2
# 15 SR4 S-5 S-5.1 3