R 基于另一个数据帧替换特定值

R 基于另一个数据帧替换特定值,r,dataframe,lookup,data.table,R,Dataframe,Lookup,Data.table,首先,让我们从数据帧1(DF1)开始: DF1df您可以为此使用-package的连接功能: library(data.table) setDT(DF1) setDT(DF2) DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)] 其中: 当两个数据集中都有许多列时,使用mget而不是关闭键入所有列名会更容易。对于问题中使用的数据,其如下所示: DF1[DF2, on = .(date, id), names(D

首先,让我们从数据帧1(DF1)开始:


DF1
df您可以为此使用-package的连接功能:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
其中:


当两个数据集中都有许多列时,使用
mget
而不是关闭键入所有列名会更容易。对于问题中使用的数据,其如下所示:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
当您想要构造需要事先添加的列名向量时,可以按如下操作:

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

cols
df通过使用merge,我只保留有限的行数。我必须保留DF1的特定格式。我只能替换DF1Use
all.x=TRUE
中的值以保留DF1的所有行
all.x
merge
的参数,而不是
ifelse
;-)在实际数据集中,DF1有416列,DF2有321列如果我尝试mget策略,但分两步进行,为什么会出现错误
cols@t.r您需要在
()
之间放置
cols
;请参阅我的答案更新
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)
library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]
> DF1
          date id sales cost city
 1: 06/19/2016  1  9999  101  LON
 2: 06/20/2016  1   150  102  MTL
 3: 06/21/2016  1   151  104  MTL
 4: 06/22/2016  1   152  107  MTL
 5: 06/23/2016  1   155   99  MTL
 6: 06/19/2016  2    84   55   NY
 7: 06/20/2016  2    83   55   NY
 8: 06/21/2016  2    80   56   NY
 9: 06/22/2016  2   777   57   QC
10: 06/23/2016  2   555   58   QC
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]
df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)

tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"

tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"

results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)