R 将NA值替换为先前值的分数增加_R_Tidyverse_Missing Data

R 将NA值替换为先前值的分数增加

R 将NA值替换为先前值的分数增加,r,tidyverse,missing-data,R,Tidyverse,Missing Data,如果我有以下数据帧 df <- tribble( ~year, ~value, 2011, 10, 2012, 15, 2013, 20, 2014, NA, 2015, NA ) df您可以在基本R中使用Reduce： Reduce(function(x, y) if(is.na(y)) x * 1.1 else y, df$value, accumulate = TRUE) #[1] 10.0 15.0

如果我有以下数据帧

df <- tribble(
      ~year, ~value,
      2011, 10,
      2012, 15,
      2013, 20,
      2014, NA,
      2015, NA
    )

df您可以在基本R中使用Reduce
：
Reduce(function(x, y) if(is.na(y)) x * 1.1 else y, df$value, accumulate = TRUE)
#[1] 10.0 15.0 20.0 22.0 24.2


如果您想要一个tidyverse
解决方案，请使用accumulate

library(dplyr)
library(purrr)

df %>% mutate(value = accumulate(value, ~if(is.na(.y)) .x * 1.1 else .y))

#   year value
#  <dbl> <dbl>
#1  2011  10  
#2  2012  15  
#3  2013  20  
#4  2014  22  
#5  2015  24.2

库（dplyr）
图书馆（purrr）
df%>%变异（值=累积（值，~if（is.na（.y））.x*1.1 else.y））
#年值
#   
#1  2011  10  
#2  2012  15  
#3  2013  20  
#4  2014  22  
#5  2015  24.2

累积
中的.x
和.y
（或减少
中的x
和y
）分别是当前值和下一个值。因此，对于第一次迭代，.x
为10，.y
为15，对于下一次迭代，.x
将变为15，.y
将变为20，依此类推。我们在这里检查下一个值（.y
）是否为NA
，然后将下一个值替换为前一个值（.x
）的1.1倍，如果不是NA
，则保持原样。
您可以在基本R中使用Reduce
：
Reduce(function(x, y) if(is.na(y)) x * 1.1 else y, df$value, accumulate = TRUE)
#[1] 10.0 15.0 20.0 22.0 24.2


如果您想要一个tidyverse
解决方案，请使用accumulate

library(dplyr)
library(purrr)

df %>% mutate(value = accumulate(value, ~if(is.na(.y)) .x * 1.1 else .y))

#   year value
#  <dbl> <dbl>
#1  2011  10  
#2  2012  15  
#3  2013  20  
#4  2014  22  
#5  2015  24.2

库（dplyr）
图书馆（purrr）
df%>%变异（值=累积（值，~if（is.na（.y））.x*1.1 else.y））
#年值
#   
#1  2011  10  
#2  2012  15  
#3  2013  20  
#4  2014  22  
#5  2015  24.2

累积
中的.x
和.y
（或减少
中的x
和y
）分别是当前值和下一个值。因此，对于第一次迭代，.x
为10，.y
为15，对于下一次迭代，.x
将变为15，.y
将变为20，依此类推。我们在此检查下一个值（.y
）是否为NA
，然后将下一个值替换为前一个值（.x
）的1.1倍，如果不是NA
，则保持原样。
在矢量计算中使用rep（）
-etition和cumprod（）
：
multiplier <- 1.1
is_na <- is.na(df$value)
df$value[is_na] <- with(
  df, 
  tail(cumprod(c(tail(value[!is_na], 1), rep(multiplier, sum(is_na)))), -1)
)

或者使用递归：
multiplier <- 1.1 
cum_prod_estimate <- function(vec, multiplier=1.1){
  if(all(!(is.na(vec)))){
    return(vec)
  }else{
    idx <- Position(is.na, vec)
    vec[idx] <- vec[idx-1] * multiplier
    return(cum_prod_estimate(vec))
  }
}

df$value <- cum_prod_estimate(df$value)

乘数在矢量化计算中使用rep（）
-etition和cumprod（）
基R：
multiplier <- 1.1
is_na <- is.na(df$value)
df$value[is_na] <- with(
  df, 
  tail(cumprod(c(tail(value[!is_na], 1), rep(multiplier, sum(is_na)))), -1)
)

或者使用递归：
multiplier <- 1.1 
cum_prod_estimate <- function(vec, multiplier=1.1){
  if(all(!(is.na(vec)))){
    return(vec)
  }else{
    idx <- Position(is.na, vec)
    vec[idx] <- vec[idx-1] * multiplier
    return(cum_prod_estimate(vec))
  }
}

df$value <- cum_prod_estimate(df$value)

multiplier能否请您解释一下tidyverse解决方案中的第二个参数accumulate中发生了什么，特别是~before if的用法？我在主要答案中添加了一些解释~
是一种基于公式的语法，是匿名函数的一种替代方法。在基本R选项中，我们使用函数（x，y）
。在purrr
中，我们对~
也做了同样的处理。请您解释一下tidyverse解决方案中的第二个参数accumulate中发生了什么，特别是~before if的用法？我在主要答案中添加了一些解释~
是一种基于公式的语法，是匿名函数的一种替代方法。在基本R选项中，我们使用函数（x，y）
。在purrr
中，我们对~
也这样做。