R中的时间戳算术运算_R_Dataframe_Timestamp_Data.table_Time Series

R中的时间戳算术运算

r dataframe

R中的时间戳算术运算,r,dataframe,timestamp,data.table,time-series,R,Dataframe,Timestamp,Data.table,Time Series,我有两个数据帧（df1，df2），其中包含大致相同时间段内的一些测量值，但时间戳不同。df1有每小时的数据，df2有每小时2-3次测量的数据。我想：将df2的每小时平均值与df中的每小时值进行比较，即每个数据帧每小时一个值在df2（df2$hrly）中创建一个新元素，其值等于df2中每个时间戳的df1小时值，即每小时2-3个值（取决于该小时df2中时间戳的数量）子集，过滤器在这种情况下不起作用-我不想使用循环。我正在考虑使用strftime和aggregate——有更好的方法吗？我正在学习

我有两个数据帧（df1，df2），其中包含大致相同时间段内的一些测量值，但时间戳不同。df1有每小时的数据，df2有每小时2-3次测量的数据。我想：

将df2的每小时平均值与df中的每小时值进行比较，即每个数据帧每小时一个值

在df2（df2$hrly）中创建一个新元素，其值等于df2中每个时间戳的df1小时值，即每小时2-3个值（取决于该小时df2中时间戳的数量）

子集

，

过滤器

在这种情况下不起作用-我不想使用循环。我正在考虑使用

strftime

和

aggregate

——有更好的方法吗？我正在学习

data.table

包-也许有一种更快/更方便的方法

以下是df1和df2的外观：

> glimpse(df1)
Observations: 7,770
Variables: 7
$ lat      <dbl> 30.46198, 30.46198, 30.46198, 30.46198, 30.46198, 30....
$ lon      <dbl> -91.17922, -91.17922, -91.17922, -91.17922, -91.17922...
$ date_gmt <chr> "2016-01-01", "2016-01-01", "2016-01-01", "2016-01-01...
$ time_gmt <chr> "06:00", "07:00", "08:00", "09:00", "10:00", "11:00",...
$ dust     <dbl> 10.7, 8.0, 8.3, 11.1, 9.1, 10.5, 9.7, 13.5, 10.5, 10....
$ state    <chr> "Louisiana", "Louisiana", "Louisiana", "Louisiana", "...
$ tme      <dttm> 2016-01-01 06:00:00, 2016-01-01 07:00:00, 2016-01-01...

df2$time\u stamp

是

POSIxct

对象（

tz=“EST”

）

因为我没有测试数据，这是我能做的最好的了。希望它能起作用

我假设您想要比较dust变量（数据帧中唯一的公共变量）。我还假设比较意味着你只想看看三角洲

步骤：

library(data.table)
df1<-data.table(tme=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="GMT"),by=3600, length.out = 100),dust=rnorm(100))
df2<-data.table(matrix(rnorm(1000*8),1000,8))
setnames(df2, c("dp1","dp2", "hz","rh","degc", "cfm", "dust","dur"))
df2[,time_stamp:=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="EST"),by=360, length.out = 1000)]

dplyr::glimpse(df1)
dplyr::glimpse(df2)

#first snippet
attr(df2$time_stamp,"tzone")<-"GMT" #make same timezone
df2[, tme:=lubridate::round_date(time_stamp, unit = "hours")] #make hourly timestamps
df3<-df2[, mean(dust), by=c("tme")] #group by tme I am assuming you want to compare the only common variable dust
setnames(df3, c("tme","dustmean"))
df_compare<-merge(df1, df3, by="tme", all=T) #this will include all observations from both data.tables
df_compare[,delta_dust:=dust-dustmean] #is that what you want as comparison?
plot(df_compare$delta_dust)

确保您的时区相同

将时间戳转换为小时数据

按小时计算变量的平均值

基于时间戳的合并

计算一个增量以进行比较

测试数据：

library(data.table)
df1<-data.table(tme=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="GMT"),by=3600, length.out = 100),dust=rnorm(100))
df2<-data.table(matrix(rnorm(1000*8),1000,8))
setnames(df2, c("dp1","dp2", "hz","rh","degc", "cfm", "dust","dur"))
df2[,time_stamp:=seq.POSIXt(as.POSIXct("2016-01-01 00:00",tz="EST"),by=360, length.out = 1000)]

dplyr::glimpse(df1)
dplyr::glimpse(df2)

#first snippet
attr(df2$time_stamp,"tzone")<-"GMT" #make same timezone
df2[, tme:=lubridate::round_date(time_stamp, unit = "hours")] #make hourly timestamps
df3<-df2[, mean(dust), by=c("tme")] #group by tme I am assuming you want to compare the only common variable dust
setnames(df3, c("tme","dustmean"))
df_compare<-merge(df1, df3, by="tme", all=T) #this will include all observations from both data.tables
df_compare[,delta_dust:=dust-dustmean] #is that what you want as comparison?
plot(df_compare$delta_dust)

库（data.table）
df1scape
在提问时没有太大帮助，因为其他人无法将其复制并粘贴到他们的会话中。在我的脚本中，我使用格式（df1$tme，tz=“EST”，usetz=TRUE）
来确保他们在同一时间，我使用轮（df1$tme，units=“hours）
因为秒数与此分析无关。虽然平均粉尘值适用于我问题的第1部分，但它不能回答问题的第2部分。还有其他列（此处删除）用于执行一些计算。我真的应该提高我对data.table操作的理解。@Marwaha您需要哪些列？所有这些？基本上，对于df2中的每个时间戳，我想使用time\u stamp
的舍入值，并使用它从df2中找到相应的值-类似于子集（df1，tme==round（df2$time_stamp[i]，units=“hours”）%%>%select（dust）
其中i
对应于（i in 1:nrow（df2））循环中的当前步骤

两个实际数据集的列与

dust

非常相似-只是更多。现在，我需要dp1、dp2、dust、cfm、degc和rh@Marwaha新的代码片段应该可以完成您想要的功能。请尝试一下。