在R中多次滞后多个变量
所以,我正在使用一个数据框架,它包含444天内的每日数据。我想在回归模型中使用几个滞后变量(在R中多次滞后多个变量,r,lag,series,R,Lag,Series,所以,我正在使用一个数据框架,它包含444天内的每日数据。我想在回归模型中使用几个滞后变量(lm)。我想让他们每人落后7次。我现在正在产生像这样的滞后 email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1) email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
lm
)。我想让他们每人落后7次。我现在正在产生像这样的滞后
email_data$email_reach1 <- lag(ts(email_data$email_reach, start = 1, end = 444), 1)
email_data$email_reach2 <- lag(ts(email_data$email_reach, start = 1, end = 444), 2)
email_data$email_reach3 <- lag(ts(email_data$email_reach, start = 1, end = 444), 3)
email_data$email_reach4 <- lag(ts(email_data$email_reach, start = 1, end = 444), 4)
email_data$email_reach5 <- lag(ts(email_data$email_reach, start = 1, end = 444), 5)
email_data$email_reach6 <- lag(ts(email_data$email_reach, start = 1, end = 444), 6)
email_data$email_reach7 <- lag(ts(email_data$email_reach, start = 1, end = 444), 7)
email\u data$email\u reach1对于任何给定的n
,我认为这与上面的代码相同
n <- 7
for (i in 1:n) {
email_data[[paste0("email_reach", i)]] <- lag(ts(email_data$email_reach, start = 1, end = 444), i)
}
n基于Molx的答案,但对任何变量列表都进行了推广,并修补了一点。。。谢谢Molx
do_lag <- function(the_data, variables, num_periods) {
num_vars <- length(variables)
num_rows <- nrow(the_data)
for (j in 1:num_vars) {
for (i in 1:num_periods) {
the_data[[paste0(variables[j], i)]] <- c(rep(NA, i), head(the_data[[variables[j]]], num_rows - i))
}
}
return(the_data)
}
do_lag另一种方法是使用xts
库。下面是一个小例子,我们从以下内容开始:
x <- ts(matrix(rnorm(100),ncol=2), start=c(2009, 1), frequency=12)
head(x)
Series 1 Series 2
[1,] -1.82934747 -0.1234372
[2,] 1.08371836 1.3365919
[3,] 0.95786815 0.0885484
[4,] 0.59301446 -0.6984993
[5,] -0.01094955 -0.3729762
[6,] -0.19256525 0.3137705
您还可以使用data.table
。(HT至@akrun)
set.seed(1)
email_data这并不是一个真正的答案,只是使用答案格式来详细说明我的上述警告:
email_data <- data.frame( email_reach=ts(email_data$email_reach, start = 1, end = 444))
collapse::flag
提供了此问题的通用快速(基于C++的)解决方案:
library(collapse)
# Time-series (also supports xts and others)
head(flag(AirPassengers, -1:2))
## F1 -- L1 L2
## Jan 1949 118 112 NA NA
## Feb 1949 132 118 112 NA
## Mar 1949 129 132 118 112
## Apr 1949 121 129 132 118
## May 1949 135 121 129 132
## Jun 1949 148 135 121 129
# Time-series matrix
head(flag(EuStockMarkets, -1:2))
## Time Series:
## Start = c(1991, 130)
## End = c(1998, 169)
## Frequency = 260
## F1.DAX DAX L1.DAX L2.DAX F1.SMI SMI L1.SMI L2.SMI F1.CAC CAC L1.CAC L2.CAC F1.FTSE FTSE L1.FTSE L2.FTSE
## 1991.496 1613.63 1628.75 NA NA 1688.5 1678.1 NA NA 1750.5 1772.8 NA NA 2460.2 2443.6 NA NA
## 1991.500 1606.51 1613.63 1628.75 NA 1678.6 1688.5 1678.1 NA 1718.0 1750.5 1772.8 NA 2448.2 2460.2 2443.6 NA
## 1991.504 1621.04 1606.51 1613.63 1628.75 1684.1 1678.6 1688.5 1678.1 1708.1 1718.0 1750.5 1772.8 2470.4 2448.2 2460.2 2443.6
## 1991.508 1618.16 1621.04 1606.51 1613.63 1686.6 1684.1 1678.6 1688.5 1723.1 1708.1 1718.0 1750.5 2484.7 2470.4 2448.2 2460.2
## 1991.512 1610.61 1618.16 1621.04 1606.51 1671.6 1686.6 1684.1 1678.6 1714.3 1723.1 1708.1 1718.0 2466.8 2484.7 2470.4 2448.2
## 1991.515 1630.75 1610.61 1618.16 1621.04 1682.9 1671.6 1686.6 1684.1 1734.5 1714.3 1723.1 1708.1 2487.9 2466.8 2484.7 2470.4
# Data frame
head(flag(airquality[1:3], -1:2))
## F1.Ozone Ozone L1.Ozone L2.Ozone F1.Solar.R Solar.R L1.Solar.R L2.Solar.R F1.Wind Wind L1.Wind L2.Wind
## 1 36 41 NA NA 118 190 NA NA 8.0 7.4 NA NA
## 2 12 36 41 NA 149 118 190 NA 12.6 8.0 7.4 NA
## 3 18 12 36 41 313 149 118 190 11.5 12.6 8.0 7.4
## 4 NA 18 12 36 NA 313 149 118 14.3 11.5 12.6 8.0
## 5 28 NA 18 12 NA NA 313 149 14.9 14.3 11.5 12.6
## 6 23 28 NA 18 299 NA NA 313 8.6 14.9 14.3 11.5
# Panel lag
head(flag(iris[1:2], -1:2, iris$Species))
## Panel-lag computed without timevar: Assuming ordered data
## F1.Sepal.Length Sepal.Length L1.Sepal.Length L2.Sepal.Length F1.Sepal.Width Sepal.Width L1.Sepal.Width L2.Sepal.Width
## 1 4.9 5.1 NA NA 3.0 3.5 NA NA
## 2 4.7 4.9 5.1 NA 3.2 3.0 3.5 NA
## 3 4.6 4.7 4.9 5.1 3.1 3.2 3.0 3.5
## 4 5.0 4.6 4.7 4.9 3.6 3.1 3.2 3.0
## 5 5.4 5.0 4.6 4.7 3.9 3.6 3.1 3.2
## 6 4.6 5.4 5.0 4.6 3.4 3.9 3.6 3.1
类似地,collapse::fdiff
和collapse::fgrowth
支持(多变量)时间序列和面板上的延迟/引导和迭代(准、对数)差异和增长率。如果延迟数据帧,可以使用类似于colnames(延迟)的东西在事实之后分配变量名我在当前加载的带有“start”参数的包中看到的lag
的唯一方法是lag.zooreg
。您应该将库调用发布到所加载的需要的包。(我发现lag
函数经常无法提供我期望的结果。它需要一些注意才能获得预期的结果。)我使用的是lag
超出基数R。“start”参数用于ts
,也在基数R中。我没有看到ts()
。关于确保它正在做您期望的事情的警告仍然适用。@akrun-wow,不知道shift
有这个狡猾的功能。谢谢
email_data <- data.frame( email_reach=ts(email_data$email_reach, start = 1, end = 444))
> head(email_data, 10)
email_reach email_reach1 email_reach2 email_reach3 email_reach4
1 4 4 4 4 4
2 4 4 4 4 4
3 5 5 5 5 5
4 7 7 7 7 7
5 4 4 4 4 4
6 7 7 7 7 7
7 7 7 7 7 7
8 6 6 6 6 6
9 6 6 6 6 6
10 3 3 3 3 3
email_reach5 email_reach6 email_reach7
1 4 4 4
2 4 4 4
3 5 5 5
4 7 7 7
5 4 4 4
6 7 7 7
7 7 7 7
8 6 6 6
9 6 6 6
10 3 3 3
library(collapse)
# Time-series (also supports xts and others)
head(flag(AirPassengers, -1:2))
## F1 -- L1 L2
## Jan 1949 118 112 NA NA
## Feb 1949 132 118 112 NA
## Mar 1949 129 132 118 112
## Apr 1949 121 129 132 118
## May 1949 135 121 129 132
## Jun 1949 148 135 121 129
# Time-series matrix
head(flag(EuStockMarkets, -1:2))
## Time Series:
## Start = c(1991, 130)
## End = c(1998, 169)
## Frequency = 260
## F1.DAX DAX L1.DAX L2.DAX F1.SMI SMI L1.SMI L2.SMI F1.CAC CAC L1.CAC L2.CAC F1.FTSE FTSE L1.FTSE L2.FTSE
## 1991.496 1613.63 1628.75 NA NA 1688.5 1678.1 NA NA 1750.5 1772.8 NA NA 2460.2 2443.6 NA NA
## 1991.500 1606.51 1613.63 1628.75 NA 1678.6 1688.5 1678.1 NA 1718.0 1750.5 1772.8 NA 2448.2 2460.2 2443.6 NA
## 1991.504 1621.04 1606.51 1613.63 1628.75 1684.1 1678.6 1688.5 1678.1 1708.1 1718.0 1750.5 1772.8 2470.4 2448.2 2460.2 2443.6
## 1991.508 1618.16 1621.04 1606.51 1613.63 1686.6 1684.1 1678.6 1688.5 1723.1 1708.1 1718.0 1750.5 2484.7 2470.4 2448.2 2460.2
## 1991.512 1610.61 1618.16 1621.04 1606.51 1671.6 1686.6 1684.1 1678.6 1714.3 1723.1 1708.1 1718.0 2466.8 2484.7 2470.4 2448.2
## 1991.515 1630.75 1610.61 1618.16 1621.04 1682.9 1671.6 1686.6 1684.1 1734.5 1714.3 1723.1 1708.1 2487.9 2466.8 2484.7 2470.4
# Data frame
head(flag(airquality[1:3], -1:2))
## F1.Ozone Ozone L1.Ozone L2.Ozone F1.Solar.R Solar.R L1.Solar.R L2.Solar.R F1.Wind Wind L1.Wind L2.Wind
## 1 36 41 NA NA 118 190 NA NA 8.0 7.4 NA NA
## 2 12 36 41 NA 149 118 190 NA 12.6 8.0 7.4 NA
## 3 18 12 36 41 313 149 118 190 11.5 12.6 8.0 7.4
## 4 NA 18 12 36 NA 313 149 118 14.3 11.5 12.6 8.0
## 5 28 NA 18 12 NA NA 313 149 14.9 14.3 11.5 12.6
## 6 23 28 NA 18 299 NA NA 313 8.6 14.9 14.3 11.5
# Panel lag
head(flag(iris[1:2], -1:2, iris$Species))
## Panel-lag computed without timevar: Assuming ordered data
## F1.Sepal.Length Sepal.Length L1.Sepal.Length L2.Sepal.Length F1.Sepal.Width Sepal.Width L1.Sepal.Width L2.Sepal.Width
## 1 4.9 5.1 NA NA 3.0 3.5 NA NA
## 2 4.7 4.9 5.1 NA 3.2 3.0 3.5 NA
## 3 4.6 4.7 4.9 5.1 3.1 3.2 3.0 3.5
## 4 5.0 4.6 4.7 4.9 3.6 3.1 3.2 3.0
## 5 5.4 5.0 4.6 4.7 3.9 3.6 3.1 3.2
## 6 4.6 5.4 5.0 4.6 3.4 3.9 3.6 3.1