R 带聚集的移动方差

R 带聚集的移动方差,r,time-series,aggregation,zoo,R,Time Series,Aggregation,Zoo,我有一些6分钟的频率热电偶数据。热电偶安装在不同的高度,每个高度都有一个按径向位置区分的热电偶 DT_TI_RECORDED HEIGHT POS TEMPERATURE 2018-05-16 00:00:00 1 90 111 2018-05-16 00:00:00 1 180 112 2018-05-16 00:00:00 1 270

我有一些6分钟的频率热电偶数据。热电偶安装在不同的高度,每个高度都有一个按径向位置区分的热电偶

DT_TI_RECORDED      HEIGHT      POS             TEMPERATURE
2018-05-16 00:00:00     1       90              111
2018-05-16 00:00:00     1       180             112
2018-05-16 00:00:00     1       270             113
2018-05-16 00:00:00     2       90              112
2018-05-16 00:00:00     2       180             114
2018-05-16 00:00:00     2       270             115
2018-05-16 00:00:00     3       90              112
2018-05-16 00:00:00     3       180             112
2018-05-16 00:00:00     3       270             113
...
2018-05-16 00:06:00     1       90              111
2018-05-16 00:06:00     1       180             112
2018-05-16 00:06:00     1       270             113
2018-05-16 00:06:00     2       90              112
2018-05-16 00:06:00     2       180             114
2018-05-16 00:06:00     2       270             112
2018-05-16 00:06:00     3       90              114
2018-05-16 00:06:00     3       180             112
2018-05-16 00:06:00     3       270             114
...
每6分钟,对于每个独特的高度和位置组合,我想计算一个向后的n小时移动方差,比如说4小时

我试图复制的原始代码是为SAS stats包编写的

    PROC EXPAND DATA=Raw_data
        OUT=Moving_Variance
        ALIGN = BEGINNING
    ;
    by HEIGHT POS;
    ID DT_TI_RECORDED ;
        CONVERT TEMPERATURE = Moving_4hour_Var /  METHOD = none TRANSFORMOUT = (MOVVAR 40); 
    #/* 40 obs at 6min freq = 4hour moving variance*/
    QUIT;
我花了几个小时搜索google,我想我需要使用的R库叫做
zoo
,我想要的函数是
rollappy
,但我不知道如何将聚合与
rollappy
结合起来

我试过了

moving_var <- Raw_data %>%
              aggregate(HEIGHT,POS) %>%
              rollapply( TEMPERATURE, width = 40, FUN = sd, fill = NA)
moving\u var%
骨料(高度,位置)%>%
滚涂(温度,宽度=40,乐趣=sd,填充=NA)

但是不起作用。我对R编程非常陌生,这让我发疯。

尝试以下聚合:

library(zoo)

result = aggregate(temp ~ pos + height,
              data = df,
              FUN = function(x){
                  rollapply(x, width = 40, FUN = var, by = 40)
              }
)
width
是滚动窗口的宽度,而
by
是下一个窗口起点跳过的点数。每个窗口中有40个窗口,您将在上一个窗口的末尾旁边看到每个窗口的开头

结果数据帧的每个窗口都有一列。这种结构可以被视为“宽”。如果您想将其作为“长”格式,请使用tidyr中的
gather
或重塑2中的
melt

例如:

df = structure(list(pos = c(0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 
                            180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 
                            0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 
                            270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 
                            180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 
                            0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 
                            270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 
                            180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 
                            0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 
                            270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 
                            180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 
                            0, 90, 180, 270), height = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                            3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), temp = 1:160), .Names = c("pos", 
                            "height", "temp"), row.names = c(NA, -160L), class = "data.frame")

> head(df,20)
   pos height temp
1    0      1    1
2   90      1    2
3  180      1    3
4  270      1    4
5    0      2    5
6   90      2    6
7  180      2    7
8  270      2    8
9    0      3    9
10  90      3   10
11 180      3   11
12 270      3   12
13   0      4   13
14  90      4   14
15 180      4   15
16 270      4   16
17   0      1   17
18  90      1   18
19 180      1   19
20 270      1   20


library(zoo)

result = aggregate(temp ~ pos + height,
              data = df,
              FUN = function(x){
                  rollapply(x, width = 3, FUN = var, by = 3)
              }
)
将导致:

   pos height temp.1 temp.2 temp.3
1    0      1    256    256    256
2   90      1    256    256    256
3  180      1    256    256    256
4  270      1    256    256    256
5    0      2    256    256    256
6   90      2    256    256    256
7  180      2    256    256    256
8  270      2    256    256    256
9    0      3    256    256    256
10  90      3    256    256    256
11 180      3    256    256    256
12 270      3    256    256    256
13   0      4    256    256    256
14  90      4    256    256    256
15 180      4    256    256    256
16 270      4    256    256    256

谢谢,这已经接近我想要的了。是否可以保留时间戳?我不想跳过任何obs,即每6分钟我想回顾上一个4小时,所以我想我可以省略by,应该可以工作。更新尝试您的代码我得到:seq.default中的错误(start.at,NROW(data),by=by):错误登录“by”argument您可以尝试的另一种方法是将pos和height的每个组合进行子集,每个组合对应一个数据帧。然后对“温度”列进行一次滚动应用。