使用tidyverse按组用数据平均值替换NA_R_Replace_Dplyr_Na_Data.table_Tidyverse_Zoo

使用tidyverse按组用数据平均值替换NA

r replace

使用tidyverse按组用数据平均值替换NA,r,replace,dplyr,na,data.table,tidyverse,zoo,R,Replace,Dplyr,Na,Data.table,Tidyverse,Zoo,我试图编写一个函数，用该变量的数据平均值（按组）替换numeric data.frame列中的NA。我意识到这是一种插补，并且有相应的软件包，我更愿意自己做这件事，平均值只是一个例子，将使用更复杂的函数。我曾尝试制作mwe，但在接近尾声时被卡住了。我正在尽可能坚持使用tidyverse方法 library(tidyverse) ## First create a little dataset for a minimum working example for questions ## three

我试图编写一个函数，用该变量的数据平均值（按组）替换numeric data.frame列中的NA。我意识到这是一种插补，并且有相应的软件包，我更愿意自己做这件事，平均值只是一个例子，将使用更复杂的函数。我曾尝试制作mwe，但在接近尾声时被卡住了。我正在尽可能坚持使用tidyverse方法

library(tidyverse)
## First create a little dataset for a minimum working example for questions
## three vectors
id <- c(rep("boh1", 6), rep("boh2", 6), rep("boh3", 6), rep("boh4", 6))
operator <- rep(c("op1", "op2"), each = 12)
nummos <- c(1, 4, 4, 3, 1, NA, 4, 2, 2, 3, 4, 4, NA, 1, 1, 5,
                     5, 4, 5, 3, 2, NA, 3, 3)
## combine vectors into df
dat1 <- data.frame(id, operator, nummos)
## group by two variables and get mean of variable by group
dat2 <- dat1 %>%
    group_by(id, operator) %>%
    summarize(mean = mean(nummos, na.rm=TRUE))
## now stuck, how to replace NA by mean value appropriate for that group?

库（tidyverse）
##首先创建一个小数据集，作为问题的最小工作示例
##三个向量
id在

时使用

突变

和

dplyr:：case_，而不是摘要
：
dat1%>%
分组依据（id，操作员）%>%
当（is.na（nummos）~mean（nummos，na.rm=TRUE）时发生突变（nummos2=case_），
TRUE~as.numeric（nummos）
)
)
在

时使用

mutate

和

dplyr:：case\u而不是summary
：
dat1%>%
分组依据（id，操作员）%>%
当（is.na（nummos）~mean（nummos，na.rm=TRUE）时发生突变（nummos2=case_），
TRUE~as.numeric（nummos）
)
)
我对tidyverse不太熟悉，所以这里有一个数据表
解决方案：
library(data.table) # load package
setDT(dat1) # convert data.frame to data.table

现在，我将通过c（id，operator）
创建一个平均值为nummos
的data.table，并将其与dat1
合并，用计算值填充NA
s：
dat1[dat1[, mean(nummos, na.rm = TRUE), by = .(id, operator)], nummos := ifelse(is.na(nummos), i.V1, nummos), on = .(id, operator)]

dat1[，mean（nummos，na.rm=TRUE），by=（id，operator）]
是一个小的data.table，包含按组划分的平均值
nummos:=ifelse…
部分仅在nummos
为NA
时才执行反赋值
dat1
      id operator nummos
 1: boh1      op1    1.0
 2: boh1      op1    4.0
 3: boh1      op1    4.0
 4: boh1      op1    3.0
 5: boh1      op1    1.0
 6: boh1      op1    2.6
 7: boh2      op1    4.0
 8: boh2      op1    2.0
 9: boh2      op1    2.0
10: boh2      op1    3.0
11: boh2      op1    4.0
12: boh2      op1    4.0
13: boh3      op2    3.2
14: boh3      op2    1.0
15: boh3      op2    1.0
16: boh3      op2    5.0
17: boh3      op2    5.0
18: boh3      op2    4.0
19: boh4      op2    5.0
20: boh4      op2    3.0
21: boh4      op2    2.0
22: boh4      op2    3.2
23: boh4      op2    3.0
24: boh4      op2    3.0
  id operator nummos

我对tidyverse不是很熟悉，所以这里有一个data.table
解决方案：
library(data.table) # load package
setDT(dat1) # convert data.frame to data.table

现在，我将通过c（id，operator）
创建一个平均值为nummos
的data.table，并将其与dat1
合并，用计算值填充NA
s：
dat1[dat1[, mean(nummos, na.rm = TRUE), by = .(id, operator)], nummos := ifelse(is.na(nummos), i.V1, nummos), on = .(id, operator)]

dat1[，mean（nummos，na.rm=TRUE），by=（id，operator）]
是一个小的data.table，包含按组划分的平均值
nummos:=ifelse…
部分仅在nummos
为NA
时才执行反赋值
dat1
      id operator nummos
 1: boh1      op1    1.0
 2: boh1      op1    4.0
 3: boh1      op1    4.0
 4: boh1      op1    3.0
 5: boh1      op1    1.0
 6: boh1      op1    2.6
 7: boh2      op1    4.0
 8: boh2      op1    2.0
 9: boh2      op1    2.0
10: boh2      op1    3.0
11: boh2      op1    4.0
12: boh2      op1    4.0
13: boh3      op2    3.2
14: boh3      op2    1.0
15: boh3      op2    1.0
16: boh3      op2    5.0
17: boh3      op2    5.0
18: boh3      op2    4.0
19: boh4      op2    5.0
20: boh4      op2    3.0
21: boh4      op2    2.0
22: boh4      op2    3.2
23: boh4      op2    3.0
24: boh4      op2    3.0
  id operator nummos

您只需使用replace（）
定义自己的函数即可。试用：
dat1 %>% 
        group_by(id, operator) %>% 
        mutate_at("nummos", function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
# output
# A tibble: 24 x 3
# Groups:   id, operator [4]
   id    operator nummos
   <fct> <fct>     <dbl>
 1 boh1  op1         1  
 2 boh1  op1         4  
 3 boh1  op1         4  
 4 boh1  op1         3  
 5 boh1  op1         1  
 6 boh1  op1         2.6
 7 boh2  op1         4  
 8 boh2  op1         2  
 9 boh2  op1         2  
10 boh2  op1         3  
# ... with 14 more rows

dat1%>%
分组依据（id，操作员）%>%
在（“nummos”，函数（x）替换（x，is.na（x），mean（x，na.rm=TRUE））处进行变异
#输出
#A tibble:24 x 3
#组：id，操作员[4]
id运算符nummos
1 boh1 op1 1
2 boh1 op1 4
3 boh1 op1 4
4 boh1 op1 3
5 boh1 op1 1
6 boh1 op1 2.6
7 boh2 op1 4
8 boh2 op1 2
9 boh2 op1 2
10 boh2 op1 3
# ... 还有14行
您只需使用replace（）
定义自己的函数即可。试用：
dat1 %>% 
        group_by(id, operator) %>% 
        mutate_at("nummos", function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
# output
# A tibble: 24 x 3
# Groups:   id, operator [4]
   id    operator nummos
   <fct> <fct>     <dbl>
 1 boh1  op1         1  
 2 boh1  op1         4  
 3 boh1  op1         4  
 4 boh1  op1         3  
 5 boh1  op1         1  
 6 boh1  op1         2.6
 7 boh2  op1         4  
 8 boh2  op1         2  
 9 boh2  op1         2  
10 boh2  op1         3  
# ... with 14 more rows

dat1%>%
分组依据（id，操作员）%>%
在（“nummos”，函数（x）替换（x，is.na（x），mean（x，na.rm=TRUE））处进行变异
#输出
#A tibble:24 x 3
#组：id，操作员[4]
id运算符nummos
1 boh1 op1 1
2 boh1 op1 4
3 boh1 op1 4
4 boh1 op1 3
5 boh1 op1 1
6 boh1 op1 2.6
7 boh2 op1 4
8 boh2 op1 2
9 boh2 op1 2
10 boh2 op1 3
# ... 还有14行
另一个解决方案，具有（非常新的）nafill
-功能：
library(data.table)
setDT(dat1)

dat1[, nummos := nafill(nummos, "const", fill = mean(nummos, na.rm = TRUE))
     , by = .(id, operator)]


以及使用-package中的na.aggregate
的解决方案：
另一个具有（全新）nafill
-功能的解决方案：
library(data.table)
setDT(dat1)

dat1[, nummos := nafill(nummos, "const", fill = mean(nummos, na.rm = TRUE))
     , by = .(id, operator)]


以及使用-package中的na.aggregate
的解决方案：
如果您只测试1个条件，则当

超过基本ifelse时，
case\u的好处是什么？这是事实，我相信两者的性能相同。您可以留在tidyverse 或在此处使用base 。似乎工作得很好，Thx。当然，但仅仅因为您正在使用其他dplyr 函数，这并不意味着您不应该也使用基函数，尤其是在基函数更简单或更合适的情况下（例如，用于检查1个条件，而不是多个条件，这是设计
时的情况）@JimMaas你要我重新打开这个问题？我不能。你可以查看关于重复项的参考。将问题作为重复项关闭并不一定是坏事。如果你只测试一个条件，那么当
over baseifelse 时case\u的好处是什么？这是真的，我相信两者都使用相同的p性能。您可以留在tidyverse 中，也可以在此处使用base 。似乎工作得很好，Thx。当然，但仅仅因为您正在使用其他dplyr 函数，这并不意味着您不应该同时使用基本函数，特别是在基本函数更简单或更合适的情况下（例如，用于检查1个条件，而不是多个条件，这是设计时的情况）@JimMaas你要我重新打开这个问题？我不能。你可以查看副本的参考。将问题作为副本关闭不一定是件坏事。Jaap，这个函数怎么知道我想用平均值替换？Thx@JimMaas默认值为mean；如果要使用另一个函数，则应指定that、另请参见帮助文件：？zoo:：na.aggregate Jaap，谢谢您的帮助，它非常有用。您能否删除上面的评论，这些评论表明它以前已经得到了回答？我发现了这两个答案，并且凭借我的中等能力，无法制定自愿提供的四个非常好的解决方案中的任何一个。如果我们他们都已经是专家了