R 不规则日期超过5年的移动平均线_R_Date_Time Series

R 不规则日期超过5年的移动平均线

r date

R 不规则日期超过5年的移动平均线,r,date,time-series,R,Date,Time Series,我有大量的文件（~1200），每个文件都包含大量关于地下水高度的数据。对于每个文件，序列的开始日期和长度是不同的。日期之间可能存在较大的数据差距，例如（此类文件的一小部分）：我想计算5年的平均身高。所以，在例子14-1-1980+5年，14-1-1985+5年的情况下。。。。每次计算平均值时，数据点的数量不同。5年后的日期很可能不会作为数据点出现在数据集中。因此，我想我需要告诉R在某个时间段内取平均值我在网上搜索，但没有找到适合我需要的东西。传递了许多有用的包，如uts、zoo、lubrid

我有大量的文件（~1200），每个文件都包含大量关于地下水高度的数据。对于每个文件，序列的开始日期和长度是不同的。日期之间可能存在较大的数据差距，例如（此类文件的一小部分）：

我想计算5年的平均身高。所以，在例子14-1-1980+5年，14-1-1985+5年的情况下。。。。每次计算平均值时，数据点的数量不同。5年后的日期很可能不会作为数据点出现在数据集中。因此，我想我需要告诉R在某个时间段内取平均值

我在网上搜索，但没有找到适合我需要的东西。传递了许多有用的包，如uts、zoo、lubridate和函数aggregate。我没有更接近解决方案，而是越来越困惑于哪种方法最适合我的问题

提前多谢

嘿，我刚看到你的问题就试了！！！在示例数据帧上运行。了解代码后，在您的手机上试用，然后让我知道

Bdw的间隔不是5年，我只用了2个月（2*30=大约2个月）作为间隔

df = data.frame(Date = c("14-1-1980", "28-1-1980", "14-2-1980", "14-3-1980", "28-3-1980",
                     "14-4-1980", "25-4-1980", "14-5-1980", "29-5-1980", "13-6-1980:",
                     "27-6-1980", "14-7-1980", "28-7-1980", "14-8-1980"), height = 1:14)

# as.Date(df$Date, "%d-%m-%Y")

df1 = data.frame(orig = NULL, dest = NULL, avg_ht = NULL)
orig = as.Date(df$Date, "%d-%m-%Y")[1]
dest = as.Date(df$Date, "%d-%m-%Y")[1] + 2*30 #approx 2 months
dest_final = as.Date(df$Date, "%d-%m-%Y")[14]

while (dest < dest_final){
  m = mean(df$height[which(as.Date(df$Date, "%d-%m-%Y")>=orig &
                           as.Date(df$Date, "%d-%m-%Y")<dest )])
  df1 = rbind(df1,data.frame(orig=orig,dest=dest,avg_ht=m))
  orig = dest
  dest = dest + 2*30
  print(paste("orig:",orig, " + ","dest:",dest))
}

> df1
        orig       dest avg_ht
1 1980-01-14 1980-03-14    2.0
2 1980-03-14 1980-05-13    5.5
3 1980-05-13 1980-07-12    9.5

df=data.frame（日期=c（“14-1-1980”、“28-1-1980”、“14-2-1980”、“14-3-1980”、“28-3-1980”），
"14-4-1980", "25-4-1980", "14-5-1980", "29-5-1980", "13-6-1980:",
“27-6-1980”、“14-7-1980”、“28-7-1980”、“14-8-1980”），高度=1:14）
#截止日期（df$日期，“%d-%m-%Y”）
df1=data.frame（orig=NULL，dest=NULL，avg_ht=NULL）
原始=截止日期（df$日期，“%d-%m-%Y”）[1]
目的地=截止日期（df$日期，“%d-%m-%Y”）[1]+2*30#约2个月
dest_final=截止日期（df$Date，“%d-%m-%Y”）[14]
同时（目的地<目的地最终）{
m=平均值（df$高度[其中（截至日期（df$日期，“%d-%m-%Y”）>=原始高度&
截止日期（df$Date，“%d-%m-%Y”）df1
原始目的地平均值
1 1980-01-14 1980-03-14    2.0
2 1980-03-14 1980-05-13    5.5
3 1980-05-13 1980-07-12    9.5

我希望这也适用于您

这是我最好的尝试，但请记住，我使用的是年份而不是完整日期，即根据您提供的示例，我在1980年初至1984年底的平均值

dat<-read.csv("paixnidi.csv")
install.packages("stringr")
library(stringr)
dates<-dat[,1]
#extract the year of each measurement
years<-as.integer(str_sub(dat[,1], start= -4))
spread_y<-years[length(years)]-years[1]

ind<-list()
#find how many 5-year intervals there are
groups<-ceiling(spread_y/4)
meangroups<-matrix(0,ncol=2,nrow=groups)
k<-0
for (i in 1:groups){
  #extract the indices of the dates vector whithin the 5-year period
  ind[[i]]<-which(years>=(years[1]+k)&years<=(years[1]+k+4),arr.ind=TRUE)
  meangroups[i,2]<-mean(dat[ind[[i]],2])
  meangroups[i,1]<-(years[1]+k)
  k<-k+5
}

colnames(meangroups)<-c("Year:Year+4","Mean Height (cm)")

dat正如@vagabond所指出的，您需要将1200个文件合并到一个数据帧中（plyr包允许您执行一些简单的操作，如：data.all%
汇总（height.mean=平均值（df$height[df$date.new>=开始和df$date.new
foverlaps

功能是这种情况的最佳选择：

library(data.table)
library(lubridate)

# convert to a data.table with setDT()
# convert the 'Date'-column to date-format
# create a begin & end date for the required period
setDT(dat)[, Date := as.Date(Date, '%d-%m-%Y')                      
           ][, `:=` (begindate = Date, enddate = Date + years(1))]

# set the keys (necessary for the foverlaps function)
setkey(dat, begindate, enddate)

res <- foverlaps(dat, dat, by.x = c(1,3))[, .(moving.average = mean(i.Height)), Date]

现在，对于每个日期，您都有一个位于该日期和该日期前一年的所有值的平均值。

首先，您可以读取所有值并将其读入一个数据帧中。也许可以查看

zoo

包中的

rollapply

。因为我使用的是while（）loop，代码会很慢。但我希望这能让您真正开始探索！！用结果更新我！您的示例运行得很好，我将在所有文件上运行它时看到它的效果。非常感谢。@BartM您的问题的解决方案如何。请接受其中一个答案。请注意我的解释它，@BartM正在寻找介于某个日期和该日期前5年之间的数据点的平均值。您也在为每个缺失的日期创建新的周期。这不是BartM问imho的问题。在这一点上，问题有点含糊不清，但如果您希望周期仅基于现有数据中的日期，

dates.starT
library(lubridate)
df$date.new <- as.Date(dmy(df$Date))

       Date Height   date.new
1 14-1-1980   7659 1980-01-14
2 28-1-1980   7632 1980-01-28
3 14-2-1980   7661 1980-02-14
4 14-3-1980   7638 1980-03-14
5 28-3-1980   7642 1980-03-28
6 14-4-1980   7652 1980-04-14

date.start <- as.Date(as.Date('1980-01-14') : as.Date('1985-01-14'), origin = '1970-01-01')
date.end <- date.start + years(1)
dates <- data.frame(start = date.start, end = date.end)

       start        end
1 1980-01-14 1981-01-14
2 1980-01-15 1981-01-15
3 1980-01-16 1981-01-16
4 1980-01-17 1981-01-17
5 1980-01-18 1981-01-18
6 1980-01-19 1981-01-19

library(dplyr)
df.mean <- dates %>% 
    group_by(start, end) %>% 
    summarize(height.mean = mean(df$Height[df$date.new >= start & df$date.new < end]))

       start        end height.mean
      <date>     <date>       <dbl>
1 1980-01-14 1981-01-14    7630.273
2 1980-01-15 1981-01-15    7632.045
3 1980-01-16 1981-01-16    7632.045
4 1980-01-17 1981-01-17    7632.045
5 1980-01-18 1981-01-18    7632.045
6 1980-01-19 1981-01-19    7632.045

library(data.table)
library(lubridate)

# convert to a data.table with setDT()
# convert the 'Date'-column to date-format
# create a begin & end date for the required period
setDT(dat)[, Date := as.Date(Date, '%d-%m-%Y')                      
           ][, `:=` (begindate = Date, enddate = Date + years(1))]

# set the keys (necessary for the foverlaps function)
setkey(dat, begindate, enddate)

res <- foverlaps(dat, dat, by.x = c(1,3))[, .(moving.average = mean(i.Height)), Date]

> head(res,15)
          Date moving.average
 1: 1980-01-14       7633.217
 2: 1980-01-28       7635.000
 3: 1980-02-14       7637.696
 4: 1980-03-14       7636.636
 5: 1980-03-28       7641.273
 6: 1980-04-14       7645.261
 7: 1980-04-25       7644.955
 8: 1980-05-14       7646.591
 9: 1980-05-29       7647.143
10: 1980-06-13       7648.400
11: 1980-06-27       7652.900
12: 1980-07-14       7655.789
13: 1980-07-28       7660.550
14: 1980-08-14       7660.895
15: 1980-08-28       7664.000