R 基于另一个数据帧替换特定值_R_Dataframe_Lookup_Data.table

R 基于另一个数据帧替换特定值

r dataframe

R 基于另一个数据帧替换特定值,r,dataframe,lookup,data.table,R,Dataframe,Lookup,Data.table,首先，让我们从数据帧1（DF1）开始： DF1df您可以为此使用-package的连接功能： library(data.table) setDT(DF1) setDT(DF2) DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)] 其中：当两个数据集中都有许多列时，使用mget而不是关闭键入所有列名会更容易。对于问题中使用的数据，其如下所示： DF1[DF2, on = .(date, id), names(D

首先，让我们从数据帧1（DF1）开始：

DF1df您可以为此使用-package的连接功能：
library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

其中：

当两个数据集中都有许多列时，使用mget
而不是关闭键入所有列名会更容易。对于问题中使用的数据，其如下所示：
DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

当您想要构造需要事先添加的列名向量时，可以按如下操作：
cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

colsdf通过使用merge，我只保留有限的行数。我必须保留DF1的特定格式。我只能替换DF1Useall.x=TRUE
中的值以保留DF1的所有行all.x
是merge
的参数，而不是ifelse
；-）在实际数据集中，DF1有416列，DF2有321列如果我尝试mget策略，但分两步进行，为什么会出现错误cols@t.r您需要在（）
之间放置cols；请参阅我的答案更新
df <- merge(DF1, DF2, by = c("date", "id"))
df$newcolumn <- ifelse(is.na(df$column.y), df$column.x, df$column.y, all.x = TRUE)

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

> DF1
          date id sales cost city
 1: 06/19/2016  1  9999  101  LON
 2: 06/20/2016  1   150  102  MTL
 3: 06/21/2016  1   151  104  MTL
 4: 06/22/2016  1   152  107  MTL
 5: 06/23/2016  1   155   99  MTL
 6: 06/19/2016  2    84   55   NY
 7: 06/20/2016  2    83   55   NY
 8: 06/21/2016  2    80   56   NY
 9: 06/22/2016  2   777   57   QC
10: 06/23/2016  2   555   58   QC

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)

tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"

tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"

results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)