R 按组计算时间序列相对于基线的相对变化。如果未测量基线值，则不适用_R_Dplyr_Time Series

R 按组计算时间序列相对于基线的相对变化。如果未测量基线值，则不适用

R 按组计算时间序列相对于基线的相对变化。如果未测量基线值，则不适用,r,dplyr,time-series,R,Dplyr,Time Series,我想用dplyr逐组计算data.frame中测量变量的相对变化。这些更改与时间==0时的第一个基线值有关在以下示例中，我可以轻松做到这一点： # with this easy example it works df.easy <- data.frame( id =c(1,1,1,2,2,2) ,time=c(0,1,2,0,1,2) ,meas=c(5,6,9,4,5,6)) df.easy %&

我想用dplyr逐组计算data.frame中测量变量的相对变化。这些更改与时间==0时的第一个基线值有关

在以下示例中，我可以轻松做到这一点：

 # with this easy example it works 
 df.easy <- data.frame( id  =c(1,1,1,2,2,2)
                   ,time=c(0,1,2,0,1,2)
                   ,meas=c(5,6,9,4,5,6))

 df.easy %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative =
 meas/meas[time==0])
     # Source: local data frame [6 x 4]
     # Groups: id [2]
     # 
     #      id  time  meas meas.relative
     #   <dbl> <dbl> <dbl>         <dbl>
     # 1     1     0     5          1.00
     # 2     1     1     6          1.20
     # 3     1     2     9          1.80
     # 4     2     0     4          1.00
     # 5     2     1     5          1.25
     # 6     2     2     6          1.50

等等，为什么高于相对测量值1

identical(
    df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas, NA) ),
    df %>% dplyr::group_by(id) %>% dplyr::mutate(meas.relative = ifelse(any(time==0), meas[time==0], NA) )
    )
    # TRUE

ifelse似乎阻止MEA选择当前行，但总是选择time==0的子集

当存在没有基线测量的ID时，如何计算相对变化？

您的问题出现在

ifelse（）

中。根据

ifelse

文档，它返回“与测试长度相同的向量”。由于

any（time==0）

对于每组的长度为1（

TRUE

或

FALSE

），因此只选择了

meas/meas[time==0]

的第一次观察。然后重复此操作以填充每组

为了解决这个问题，我只做了

rep

将

any（）

作为组的长度。我认为这应该是可行的：

df %>% dplyr::group_by(id) %>% 
       dplyr::mutate(meas.relative = ifelse(rep(any(time==0),times = n()), meas/meas[time==0], NA) )

  #       id  time  meas meas.relative
  #    <dbl> <dbl> <dbl>         <dbl>
  #  1     1     0     5          1.00
  #  2     1     1     6          1.20
  #  3     1     2     9          1.80
  #  4     2     0     4          1.00
  #  5     2     1     5          1.25
  #  6     2     2     6          1.50
  #  7     3     1     5            NA
  #  8     3     2     6            NA

编辑：A

数据。表

具有相同概念的解决方案：

as.data.table(df)[, meas.rel := ifelse(rep(any(time==0), .N), meas/meas[time==0], NA_real_)
                  ,by=id]

解决方法是以下两步解决方案

df%dplyr:：group_by（id）%%>%dplyr:：mutate（meas.baseline=ifelse（any（time==0），meas[time==0]，NA））

和

dplyr:：mutate（df，meas.relative=meas/meas.baseline）

。在data.table中，我相信您可以用

setDT（df），meas.rel:=（meas）/meas[time==0]，by=id]来实现这一点

虽然我不能100%确定您想要的输出。谢谢@Imo的回复。但是，我还没有测试它，因为我的工作机器上没有data.table。如果组中不存在

meas[time==0]

，则返回

NA

？是。本例中的最后两个观察值为NA，并计算了其他值的比率。哇，令人惊讶的是，快速回答并清楚地解释了哪里出了问题。非常感谢。没问题，很乐意帮忙！

df %>% dplyr::group_by(id) %>% 
       dplyr::mutate(meas.relative = ifelse(rep(any(time==0),times = n()), meas/meas[time==0], NA) )

  #       id  time  meas meas.relative
  #    <dbl> <dbl> <dbl>         <dbl>
  #  1     1     0     5          1.00
  #  2     1     1     6          1.20
  #  3     1     2     9          1.80
  #  4     2     0     4          1.00
  #  5     2     1     5          1.25
  #  6     2     2     6          1.50
  #  7     3     1     5            NA
  #  8     3     2     6            NA

ifelse(TRUE,c(1,2,3),NA)
#[1] 1

as.data.table(df)[, meas.rel := ifelse(rep(any(time==0), .N), meas/meas[time==0], NA_real_)
                  ,by=id]