R 基于另一个数据帧替换特定值
首先,让我们从数据帧1(DF1)开始:R 基于另一个数据帧替换特定值,r,dataframe,lookup,data.table,R,Dataframe,Lookup,Data.table,首先,让我们从数据帧1(DF1)开始: DF1df您可以为此使用-package的连接功能: library(data.table) setDT(DF1) setDT(DF2) DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)] 其中: 当两个数据集中都有许多列时,使用mget而不是关闭键入所有列名会更容易。对于问题中使用的数据,其如下所示: DF1[DF2, on = .(date, id), names(D
DF1df您可以为此使用-package的连接功能:
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
其中:
当两个数据集中都有许多列时,使用mget
而不是关闭键入所有列名会更容易。对于问题中使用的数据,其如下所示:
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
当您想要构造需要事先添加的列名向量时,可以按如下操作:
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
colsdf通过使用merge,我只保留有限的行数。我必须保留DF1的特定格式。我只能替换DF1Useall.x=TRUE
中的值以保留DF1的所有行all.x
是merge
的参数,而不是ifelse
;-)在实际数据集中,DF1有416列,DF2有321列如果我尝试mget策略,但分两步进行,为什么会出现错误cols@t.r您需要在()
之间放置cols
;请参阅我的答案更新
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
library(data.table)
setDT(DF1)
setDT(DF2)
DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
> DF1
date id sales cost city
1: 06/19/2016 1 9999 101 LON
2: 06/20/2016 1 150 102 MTL
3: 06/21/2016 1 151 104 MTL
4: 06/22/2016 1 152 107 MTL
5: 06/23/2016 1 155 99 MTL
6: 06/19/2016 2 84 55 NY
7: 06/20/2016 2 83 55 NY
8: 06/21/2016 2 80 56 NY
9: 06/22/2016 2 777 57 QC
10: 06/23/2016 2 555 58 QC
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)
tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"
tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"
results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)