R中不平衡面板上的简单移动平均线

R中不平衡面板上的简单移动平均线,r,data.table,plyr,panel-data,split-apply-combine,R,Data.table,Plyr,Panel Data,Split Apply Combine,我正在处理一个不平衡的、间隔不规则的横截面时间序列。我的目标是获得“数量”向量的滞后移动平均向量,按“主题”分段 换句话说,假设已观察到受试者_1的以下数量: [1,2,3,4,5]. 我首先需要滞后1,产生[NA,1,2,3,4] 然后我需要取一个3阶移动平均值,得到[NA,NA,NA,(3+2+1)/3,(4+3+2)/3] 所有受试者都需要进行上述操作 # Construct example balanced panel DF panel <- data.frame( as.fa

我正在处理一个不平衡的、间隔不规则的横截面时间序列。我的目标是获得“数量”向量的滞后移动平均向量,按“主题”分段

换句话说,假设已观察到受试者_1的以下数量: [1,2,3,4,5]. 我首先需要滞后1,产生[NA,1,2,3,4]

然后我需要取一个3阶移动平均值,得到[NA,NA,NA,(3+2+1)/3,(4+3+2)/3]

所有受试者都需要进行上述操作

# Construct example balanced panel DF
panel <- data.frame(
  as.factor(sort(rep(1:6,5))),
  rep(1:5,6),
  rnorm(30)                
)
colnames(panel)<- c("Subject","Day","Quantity")

#Make panel DF unbalanced
panelUNB <- subset(panel,as.numeric(Subject)!= Day)
panelUNB <- panelUNB[-c(15,16),]
当应用于平衡的“面板”DF时,这将产生正确的结果

问题在于
plm
lag
依赖于序列的均匀间隔来生成索引变量,而rollapply要求所有受试者的观察次数(windowsize)相等

StackExchange with data.table上有一个解决方案提示我的问题的解决方案:


也许可以修改此解决方案以生成固定长度的移动平均值,而不是“滚动累积平均值”。

因此,为了回答我自己的问题,一种方法是通过拆分重叠(滚动平均值)-未列出:


Temp这会给您带来想要的结果吗

library(reshape2)
library(zoo)

# create time series where each subject have an observation at each time step
d1 <- data.frame(subject = rep(letters[1:4], each = 5),
                 day = rep(1:5, 4),
                 quantity = sample(x = 1:4, size = 20, replace = TRUE))
d1

# select some random observations
d2 <- d1[sample(x = seq_len(nrow(d1)), size = 15), ]
d2

# reshape to wide format with dcast
# -> 'automatic' extension from irregular to regular series for each subject,
# _given_ that all time steps are represented.
# Alternative method below more explicit

# fill for structural missings defaults to NA
d3 <- dcast(d2, day ~ subject, value.var = "quantity")
d3

# convert to zoo time series 
z1 <- zoo(x = d3[ , -1], order.by = d3$day)

################################
# alternative method to extend time series
# time steps to include are given explicitly

# create a zero-dimensional zoo series
z0 <- zoo(, min(d1$day):max(d1$day))

# extend z1 to contain the same time indices as z0 
z1 <- merge(z1, z0) 
################################

# lag, defaults to one unit 
z2 <- lag(x = z1)
z2

# calculate rolling mean with window width 3
rollmeanr(x = z2, k = 3)

# Handling of NAs:
# from ?rollmean:
# "The default method of rollmean does not handle inputs that contain NAs.
# In such cases, use rollapply instead.": 
rollapplyr(data = z2, width = 3, FUN = mean, na.rm = TRUE)
library(重塑2)
图书馆(动物园)
#创建时间序列,其中每个受试者在每个时间步都有观察结果

d1答案需要对序列进行正则化。在我的例子中,这将需要在序列中插入大量NA,并导致移动平均线(NA.rm=TRUE)行为不稳定。然而,我将使用你的一些想法来用NA“填充”系列,而不是插入NA。因此+1用于共享一些有用的代码。另请参见和
Temp <-with(panelUNB, split(Quantity, Subject))
Temp <- lapply(Temp, FUN=function (x) rollapplyr(
   x,2,align="right",fill=NA,na.rm=TRUE, FUN=mean))
QuantityMA <-unlist(Temp)
library(reshape2)
library(zoo)

# create time series where each subject have an observation at each time step
d1 <- data.frame(subject = rep(letters[1:4], each = 5),
                 day = rep(1:5, 4),
                 quantity = sample(x = 1:4, size = 20, replace = TRUE))
d1

# select some random observations
d2 <- d1[sample(x = seq_len(nrow(d1)), size = 15), ]
d2

# reshape to wide format with dcast
# -> 'automatic' extension from irregular to regular series for each subject,
# _given_ that all time steps are represented.
# Alternative method below more explicit

# fill for structural missings defaults to NA
d3 <- dcast(d2, day ~ subject, value.var = "quantity")
d3

# convert to zoo time series 
z1 <- zoo(x = d3[ , -1], order.by = d3$day)

################################
# alternative method to extend time series
# time steps to include are given explicitly

# create a zero-dimensional zoo series
z0 <- zoo(, min(d1$day):max(d1$day))

# extend z1 to contain the same time indices as z0 
z1 <- merge(z1, z0) 
################################

# lag, defaults to one unit 
z2 <- lag(x = z1)
z2

# calculate rolling mean with window width 3
rollmeanr(x = z2, k = 3)

# Handling of NAs:
# from ?rollmean:
# "The default method of rollmean does not handle inputs that contain NAs.
# In such cases, use rollapply instead.": 
rollapplyr(data = z2, width = 3, FUN = mean, na.rm = TRUE)