R 通过与另一行合并来替换缺少的值_R_Data.table

R 通过与另一行合并来替换缺少的值

R 通过与另一行合并来替换缺少的值,r,data.table,R,Data.table,我将不同来源的数据读入data.table。两个源为同一时间步提供不同的变量如何用其他源（行）替换缺少的变量期望输出： SourceCode time LE R 1: 2 1 10 20 2: 2 2 10 30 我最近发现了dplyr:：coalesce（）：简单的解决办法是： library(dplyr) coalesce( filter(gg, SourceCode == 2), filter(gg, SourceCo

我将不同来源的数据读入data.table。两个源为同一时间步提供不同的变量

如何用其他源（行）替换缺少的变量

期望输出：

   SourceCode time LE  R
1:          2    1 10 20
2:          2    2 10 30

我最近发现了

dplyr:：coalesce（）

：

简单的解决办法是：

library(dplyr)
coalesce(
  filter(gg, SourceCode == 2),
  filter(gg, SourceCode == 1)
)
  SourceCode time LE  R
1          2    1 10 20
2          2    2 10 30

但更具普遍性的是：

do.call(coalesce, split(gg, gg$SourceCode))
   SourceCode time LE  R
1:          1    1 10 20
2:          1    2 10 30

如果您希望以第二个源（或最后一个源）为基础，可以执行以下操作：

do.call(coalesce, rev(split(gg, gg$SourceCode)))
   SourceCode time LE  R
1:          2    1 10 20
2:          2    2 10 30

由于您似乎正在使用

data.table

s，下面是一个

data.table

解决方案

unique(gg[, `:=`(LE = LE[!is.na(LE)], R = R[!is.na(R)]), by = time], by = "time")
#   SourceCode time LE  R
#1:          1    1 10 20
#2:          1    2 10 30

还是第二个来源

unique(gg[, `:=`(LE = LE[!is.na(LE)], R = R[!is.na(R)]), by = time], by = "time", fromLast = T)
#   SourceCode time LE  R
#1:          2    1 10 20
#2:          2    2 10 30

由于

SourceCode

似乎不再相关（您可以跨不同的

SourceCode

s进行总结），您也可以这样做

gg[, lapply(.SD, function(x) x[!is.na(x)]), by = time, .SDcols = 3:4]
#   time LE  R
#1:    1 10 20
#2:    2 10 30

一个选择：

library(tidyverse)
dd %>% 
    gather(var, val, -SourceCode, -time) %>% 
    na.omit(val) %>% 
    spread(var, val)

#   SourceCode time LE  R
# 1          2    1 10 20
# 2          2    2 10 30

或基于分组的其他选项

dd %>% 
    group_by(SourceCode, time) %>% 
    summarise_at(vars(LE:R), .funs = funs(.[which(!is.na(.))]))

#   SourceCode time LE  R
# 1          2    1 10 20
# 2          2    2 10 30

请注意，我只在

group_by

调用中添加源代码，以将其保留在摘要中。如果您不需要该列，可以省略该列。

请提供您已经尝试过的代码。现在你的帖子里没有任何问题。

dd %>% 
    group_by(SourceCode, time) %>% 
    summarise_at(vars(LE:R), .funs = funs(.[which(!is.na(.))]))

#   SourceCode time LE  R
# 1          2    1 10 20
# 2          2    2 10 30