R 带聚集的移动方差
我有一些6分钟的频率热电偶数据。热电偶安装在不同的高度,每个高度都有一个按径向位置区分的热电偶R 带聚集的移动方差,r,time-series,aggregation,zoo,R,Time Series,Aggregation,Zoo,我有一些6分钟的频率热电偶数据。热电偶安装在不同的高度,每个高度都有一个按径向位置区分的热电偶 DT_TI_RECORDED HEIGHT POS TEMPERATURE 2018-05-16 00:00:00 1 90 111 2018-05-16 00:00:00 1 180 112 2018-05-16 00:00:00 1 270
DT_TI_RECORDED HEIGHT POS TEMPERATURE
2018-05-16 00:00:00 1 90 111
2018-05-16 00:00:00 1 180 112
2018-05-16 00:00:00 1 270 113
2018-05-16 00:00:00 2 90 112
2018-05-16 00:00:00 2 180 114
2018-05-16 00:00:00 2 270 115
2018-05-16 00:00:00 3 90 112
2018-05-16 00:00:00 3 180 112
2018-05-16 00:00:00 3 270 113
...
2018-05-16 00:06:00 1 90 111
2018-05-16 00:06:00 1 180 112
2018-05-16 00:06:00 1 270 113
2018-05-16 00:06:00 2 90 112
2018-05-16 00:06:00 2 180 114
2018-05-16 00:06:00 2 270 112
2018-05-16 00:06:00 3 90 114
2018-05-16 00:06:00 3 180 112
2018-05-16 00:06:00 3 270 114
...
每6分钟,对于每个独特的高度和位置组合,我想计算一个向后的n小时移动方差,比如说4小时
我试图复制的原始代码是为SAS stats包编写的
PROC EXPAND DATA=Raw_data
OUT=Moving_Variance
ALIGN = BEGINNING
;
by HEIGHT POS;
ID DT_TI_RECORDED ;
CONVERT TEMPERATURE = Moving_4hour_Var / METHOD = none TRANSFORMOUT = (MOVVAR 40);
#/* 40 obs at 6min freq = 4hour moving variance*/
QUIT;
我花了几个小时搜索google,我想我需要使用的R库叫做zoo
,我想要的函数是rollappy
,但我不知道如何将聚合与rollappy
结合起来
我试过了
moving_var <- Raw_data %>%
aggregate(HEIGHT,POS) %>%
rollapply( TEMPERATURE, width = 40, FUN = sd, fill = NA)
moving\u var%
骨料(高度,位置)%>%
滚涂(温度,宽度=40,乐趣=sd,填充=NA)
但是不起作用。我对R编程非常陌生,这让我发疯。尝试以下聚合:
library(zoo)
result = aggregate(temp ~ pos + height,
data = df,
FUN = function(x){
rollapply(x, width = 40, FUN = var, by = 40)
}
)
width
是滚动窗口的宽度,而by
是下一个窗口起点跳过的点数。每个窗口中有40个窗口,您将在上一个窗口的末尾旁边看到每个窗口的开头
结果数据帧的每个窗口都有一列。这种结构可以被视为“宽”。如果您想将其作为“长”格式,请使用tidyr中的gather
或重塑2中的melt
例如:
df = structure(list(pos = c(0, 90, 180, 270, 0, 90, 180, 270, 0, 90,
180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270,
0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180,
270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90,
180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270,
0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180,
270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90,
180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270,
0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180,
270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90,
180, 270, 0, 90, 180, 270, 0, 90, 180, 270, 0, 90, 180, 270,
0, 90, 180, 270), height = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), temp = 1:160), .Names = c("pos",
"height", "temp"), row.names = c(NA, -160L), class = "data.frame")
> head(df,20)
pos height temp
1 0 1 1
2 90 1 2
3 180 1 3
4 270 1 4
5 0 2 5
6 90 2 6
7 180 2 7
8 270 2 8
9 0 3 9
10 90 3 10
11 180 3 11
12 270 3 12
13 0 4 13
14 90 4 14
15 180 4 15
16 270 4 16
17 0 1 17
18 90 1 18
19 180 1 19
20 270 1 20
library(zoo)
result = aggregate(temp ~ pos + height,
data = df,
FUN = function(x){
rollapply(x, width = 3, FUN = var, by = 3)
}
)
将导致:
pos height temp.1 temp.2 temp.3
1 0 1 256 256 256
2 90 1 256 256 256
3 180 1 256 256 256
4 270 1 256 256 256
5 0 2 256 256 256
6 90 2 256 256 256
7 180 2 256 256 256
8 270 2 256 256 256
9 0 3 256 256 256
10 90 3 256 256 256
11 180 3 256 256 256
12 270 3 256 256 256
13 0 4 256 256 256
14 90 4 256 256 256
15 180 4 256 256 256
16 270 4 256 256 256
谢谢,这已经接近我想要的了。是否可以保留时间戳?我不想跳过任何obs,即每6分钟我想回顾上一个4小时,所以我想我可以省略by,应该可以工作。更新尝试您的代码我得到:seq.default中的错误(start.at,NROW(data),by=by):错误登录“by”argument您可以尝试的另一种方法是将pos和height的每个组合进行子集,每个组合对应一个数据帧。然后对“温度”列进行一次滚动应用。