Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 每15分钟间隔的最接近值_R_Dplyr_Data.table_Zoo - Fatal编程技术网

R 每15分钟间隔的最接近值

R 每15分钟间隔的最接近值,r,dplyr,data.table,zoo,R,Dplyr,Data.table,Zoo,我希望获得间隔15分钟(即12:00:00 AM、12:15:00 AM、12:30:00 AM)之间任意数量读数的最接近的先前读数 例如,我希望有df: Timestamp Value (kW) 8/12/2018 23:00:06 51 8/13/2018 0:00:16 52 8/13/2018 0:10:26 53 8/13/2018 0:14:36 54 8/13/2018 0:15:00 55 8/13/2018 0:19:57 56 8/13/2018 0

我希望获得间隔15分钟(即12:00:00 AM、12:15:00 AM、12:30:00 AM)之间任意数量读数的最接近的先前读数

例如,我希望有
df

Timestamp   Value (kW)
8/12/2018 23:00:06  51
8/13/2018 0:00:16   52
8/13/2018 0:10:26   53
8/13/2018 0:14:36   54
8/13/2018 0:15:00   55
8/13/2018 0:19:57   56
8/13/2018 0:29:09   57
8/13/2018 0:38:17   58
8/13/2018 0:44:59   59
8/13/2018 0:45:00   60
8/13/2018 0:58:47   61
8/13/2018 1:01:57   62


structure(list(Timestamp = c("8/12/2018 23:00:00", "8/13/2018 0:00:00", 
"8/13/2018 0:10:00", "8/13/2018 0:14:00", "8/13/2018 0:15:00", 
"8/13/2018 0:19:00", "8/13/2018 0:29:00", "8/13/2018 0:38:00", 
"8/13/2018 0:44:00", "8/13/2018 0:45:00", "8/13/2018 0:58:00", 
"8/13/2018 1:01:00"), Value..kW. = 51:62), .Names = c("Timestamp", 
"Value..kW."), class = "data.frame", row.names = c(NA, -12L))
查看更接近df2的内容:

Interval    Value
8/13/2018 0:00:00   51
8/13/2018 0:15:00   55
8/13/2018 0:30:00   57
8/13/2018 0:45:00   60
8/13/2018 1:00:00   61
请注意
秒数

我在想
zoo
dplyr
数据中的
nalocf
函数。表
可以让我部分达到目的。打开其他包。

这可能与示例结果略有不同。我不确定您的示例输出是否100%正确。e、 g关于12/8的数据呢

库lubridate具有许多有用的日期时间特性。这会将字符转换为日期,并舍入到最近的句点。(有
楼层日期
天花板日期
功能,以及分别向下或向上取整的功能)

库(dplyr)
图书馆(lubridate)
df%>%
#确保时间戳为日期类型,并四舍五入到最接近的十五分钟
突变(ts=mdy_-hm(时间戳),
期间=四舍五入日期(ts,单位=“15分钟”)%>%
#分组
分组单位(期间)%>%
#获取每个时段的第一行,按时间戳排序(最后使用-1)
顶部n(-1,ts)%>%
#订购reuslt
安排(期间)
#时间戳值..千瓦。ts周期
#                                               
# 1 8/12/2018 23:00         51 2018-08-12 23:00:00 2018-08-12 23:00:00
# 2 8/13/2018 0:00          52 2018-08-13 00:00:00 2018-08-13 00:00:00
# 3 8/13/2018 0:10          53 2018-08-13 00:10:00 2018-08-13 00:15:00
# 4 8/13/2018 0:29          57 2018-08-13 00:29:00 2018-08-13 00:30:00
# 5 8/13/2018 0:38          58 2018-08-13 00:38:00 2018-08-13 00:45:00

这可能与示例结果略有不同。我不确定您的示例输出是否100%正确。e、 g关于12/8的数据呢

库lubridate具有许多有用的日期时间特性。这会将字符转换为日期,并舍入到最近的句点。(有
楼层日期
天花板日期
功能,以及分别向下或向上取整的功能)

库(dplyr)
图书馆(lubridate)
df%>%
#确保时间戳为日期类型,并四舍五入到最接近的十五分钟
突变(ts=mdy_-hm(时间戳),
期间=四舍五入日期(ts,单位=“15分钟”)%>%
#分组
分组单位(期间)%>%
#获取每个时段的第一行,按时间戳排序(最后使用-1)
顶部n(-1,ts)%>%
#订购reuslt
安排(期间)
#时间戳值..千瓦。ts周期
#                                               
# 1 8/12/2018 23:00         51 2018-08-12 23:00:00 2018-08-12 23:00:00
# 2 8/13/2018 0:00          52 2018-08-13 00:00:00 2018-08-13 00:00:00
# 3 8/13/2018 0:10          53 2018-08-13 00:10:00 2018-08-13 00:15:00
# 4 8/13/2018 0:29          57 2018-08-13 00:29:00 2018-08-13 00:30:00
# 5 8/13/2018 0:38          58 2018-08-13 00:38:00 2018-08-13 00:45:00

这可能是
数据的一个很好的应用程序。表
使用“最近”选项进行滚动联接

第一步是将数据放入具有正确格式的
POSIXct
时间戳的
data.table
type对象中

library(data.table)

DT <- structure(list(Timestamp = c("8/12/2018 23:00:00", "8/13/2018 0:00:00", 
                             "8/13/2018 0:10:00", "8/13/2018 0:14:00", "8/13/2018 0:15:00", 
                             "8/13/2018 0:19:00", "8/13/2018 0:29:00", "8/13/2018 0:38:00", 
                             "8/13/2018 0:44:00", "8/13/2018 0:45:00", "8/13/2018 0:58:00", 
                             "8/13/2018 1:01:00"), Value..kW. = 51:62), .Names = c("Timestamp", 
                                                                                   "Value..kW."), class = "data.frame", row.names = c(NA, -12L))
## Convert from data.frame to data.table
setDT(DT)

## Convert to POSIXct
DT[,Timestamp := as.POSIXct(Timestamp, format = "%m/%d/%Y %H:%M:%S", tz = "UTC")]
如果您是
数据.table
新手,这可能有点难理解,这个例子是最高级的——数据.table网站上的页面。如果您以前没有使用过数据.table,那么table网站可能是一个很好的起点

执行
help(“data.table”)
将为您提供一个简洁的描述,但有一个很好的例子,说明了Ben Gorman在其博客上写的一些功能,还有一个是Rober Norberg在其博客上写的,这可能有助于更好地理解

更新:看起来您可能只希望结转观测值,而不必执行“最接近”值——在这种情况下,选项如下所示:

(使用相同的
DT
作为起点)

##开始和结束

开始对于
数据,这可能是一个很好的应用程序。表
使用“最近”选项进行滚动联接

第一步是将数据放入具有正确格式的
POSIXct
时间戳的
data.table
type对象中

library(data.table)

DT <- structure(list(Timestamp = c("8/12/2018 23:00:00", "8/13/2018 0:00:00", 
                             "8/13/2018 0:10:00", "8/13/2018 0:14:00", "8/13/2018 0:15:00", 
                             "8/13/2018 0:19:00", "8/13/2018 0:29:00", "8/13/2018 0:38:00", 
                             "8/13/2018 0:44:00", "8/13/2018 0:45:00", "8/13/2018 0:58:00", 
                             "8/13/2018 1:01:00"), Value..kW. = 51:62), .Names = c("Timestamp", 
                                                                                   "Value..kW."), class = "data.frame", row.names = c(NA, -12L))
## Convert from data.frame to data.table
setDT(DT)

## Convert to POSIXct
DT[,Timestamp := as.POSIXct(Timestamp, format = "%m/%d/%Y %H:%M:%S", tz = "UTC")]
如果您是
数据.table
新手,这可能有点难理解,这个例子是最高级的——数据.table
网站上的页面。如果您以前没有使用过数据.table,那么table网站可能是一个很好的起点

执行
help(“data.table”)
将为您提供一个简洁的描述,但有一个很好的例子,说明了Ben Gorman在其博客上写的一些功能,还有一个是Rober Norberg在其博客上写的,这可能有助于更好地理解

更新:看起来您可能只希望结转观测值,而不必执行“最接近”值——在这种情况下,选项如下所示:

(使用相同的
DT
作为起点)

##开始和结束

开始根据输入数据的结构和预期结果,OP有多种选择

从问题和样本数据集来看,如果输入数据包含间隙,即间隔超过15分钟且未记录数据,则不完全清楚预期结果应该是什么样子。OP希望输入数据中的差距如何反映在结果中

编辑:OP提供了两个略有不同的数据集。下面使用这两种方法来演示输入数据对结果的影响

下面的变量将使用
lubridate
数据。表
。假定
df
已按
timestamp
排序

准备 Th
## Get Start and Ends
Start <- min(as.POSIXct(cut.POSIXt(DT[,Timestamp],breaks = c("15 min")), tz = "UTC"))
End <- max(as.POSIXct(cut.POSIXt(DT[,Timestamp],breaks = c("15 min")), tz = "UTC"))
## Generate data.table with a sequence
SummaryDT <- data.table(TimeStamp15 = seq.POSIXt(from = Start, to = End, by = "15 min"))

print(SummaryDT)
#            TimeStamp15
# 1: 2018-08-12 23:00:00
# 2: 2018-08-12 23:15:00
# 3: 2018-08-12 23:30:00
# 4: 2018-08-12 23:45:00
# 5: 2018-08-13 00:00:00
# 6: 2018-08-13 00:15:00
# 7: 2018-08-13 00:30:00
# 8: 2018-08-13 00:45:00
# 9: 2018-08-13 01:00:00
## Set keys
setkey(SummaryDT,TimeStamp15)
setkey(DT,Timestamp)

## Create a new column in SummaryDT with the closest measurement
SummaryDT[DT, Closest_Value_kW := `i.Value..kW.` , roll = "nearest"]
print(SummaryDT)
#            TimeStamp15 Closest_Value_kW
# 1: 2018-08-12 23:00:00               51
# 2: 2018-08-12 23:15:00               NA
# 3: 2018-08-12 23:30:00               NA
# 4: 2018-08-12 23:45:00               NA
# 5: 2018-08-13 00:00:00               52
# 6: 2018-08-13 00:15:00               56
# 7: 2018-08-13 00:30:00               57
# 8: 2018-08-13 00:45:00               60
# 9: 2018-08-13 01:00:00               62
## Get Start and Ends
Start <- min(as.POSIXct(cut.POSIXt(DT[,Timestamp],breaks = c("15 min")), tz = "UTC"))
End <- max(as.POSIXct(cut.POSIXt(DT[,Timestamp],breaks = c("15 min"),), tz = "UTC"))
## Generate data.table with a sequence
SummaryDT <-data.table(TimeStamp15 = seq.POSIXt(from = Start, to = End, by = "15 min"))

## Set keys
setkey(SummaryDT,TimeStamp15)
setkey(DT,Timestamp)
## Do a rolling join
FinalDT <- DT[SummaryDT, roll = +Inf]

print(FinalDT)
#              Timestamp Value..kW.
# 1: 2018-08-12 23:00:00         51
# 2: 2018-08-12 23:15:00         51
# 3: 2018-08-12 23:30:00         51
# 4: 2018-08-12 23:45:00         51
# 5: 2018-08-13 00:00:00         52
# 6: 2018-08-13 00:15:00         55
# 7: 2018-08-13 00:30:00         57
# 8: 2018-08-13 00:45:00         60
# 9: 2018-08-13 01:00:00         61
library(lubridate)
library(data.table)
setDT(df)[, Timestamp := mdy_hms(Timestamp)]
df[, .SD[.N], by = .(Interval = ceiling_date(Timestamp, "15 min"))]
              Interval Value..kW.
1: 2018-08-12 23:00:00         51
2: 2018-08-13 00:00:00         52
3: 2018-08-13 00:15:00         55
4: 2018-08-13 00:30:00         57
5: 2018-08-13 00:45:00         60
6: 2018-08-13 01:00:00         61
7: 2018-08-13 01:15:00         62
df[, .SD[which.max(Timestamp)], keyby = .(Interval = ceiling_date(Timestamp, "15 min"))]
df0[, .SD[.N], by = .(Interval = ceiling_date(Timestamp, "15 min"))]
1: 2018-08-12 23:15:00         51
2: 2018-08-13 00:15:00         55
3: 2018-08-13 00:30:00         57
4: 2018-08-13 00:45:00         60
5: 2018-08-13 01:00:00         61
6: 2018-08-13 01:15:00         62
step <- "15 min"
df[, .SD[.N], by = .(Interval = ceiling_date(Timestamp, step))][
  .(seq(min(Interval), max(Interval), step)), on = .(Interval = V1)]
               Interval Value..kW.
 1: 2018-08-12 23:00:00         51
 2: 2018-08-12 23:15:00         NA
 3: 2018-08-12 23:30:00         NA
 4: 2018-08-12 23:45:00         NA
 5: 2018-08-13 00:00:00         52
 6: 2018-08-13 00:15:00         55
 7: 2018-08-13 00:30:00         57
 8: 2018-08-13 00:45:00         60
 9: 2018-08-13 01:00:00         61
10: 2018-08-13 01:15:00         62
df0[, .SD[.N], by = .(Interval = ceiling_date(Timestamp, step))][
  .(seq(min(Interval), max(Interval), step)), on = .(Interval = V1)]
              Interval Value..kW.
1: 2018-08-12 23:15:00         51
2: 2018-08-12 23:30:00         NA
3: 2018-08-12 23:45:00         NA
4: 2018-08-13 00:00:00         NA
5: 2018-08-13 00:15:00         55
6: 2018-08-13 00:30:00         57
7: 2018-08-13 00:45:00         60
8: 2018-08-13 01:00:00         61
9: 2018-08-13 01:15:00         62
step = "15 min"
df[.(seq(floor_date(min(Timestamp), step), ceiling_date(max(Timestamp), step),by = step)), 
   on = .(Timestamp = V1), roll = TRUE]
              Timestamp Value..kW.
 1: 2018-08-12 23:00:00         51
 2: 2018-08-12 23:15:00         51
 3: 2018-08-12 23:30:00         51
 4: 2018-08-12 23:45:00         51
 5: 2018-08-13 00:00:00         52
 6: 2018-08-13 00:15:00         55
 7: 2018-08-13 00:30:00         57
 8: 2018-08-13 00:45:00         60
 9: 2018-08-13 01:00:00         61
10: 2018-08-13 01:15:00         62
df0[.(seq(floor_date(min(Timestamp), step), ceiling_date(max(Timestamp), step),by = step)), 
   on = .(Timestamp = V1), roll = TRUE]
              Timestamp Value..kW.
 1: 2018-08-12 23:00:00         NA
 2: 2018-08-12 23:15:00         51
 3: 2018-08-12 23:30:00         51
 4: 2018-08-12 23:45:00         51
 5: 2018-08-13 00:00:00         51
 6: 2018-08-13 00:15:00         55
 7: 2018-08-13 00:30:00         57
 8: 2018-08-13 00:45:00         60
 9: 2018-08-13 01:00:00         61
10: 2018-08-13 01:15:00         62
df0[.(seq(ceiling_date(min(Timestamp), step), ceiling_date(max(Timestamp), step),by = step)), 
    on = .(Timestamp = V1), roll = TRUE]

             Timestamp Value..kW.
1: 2018-08-12 23:15:00         51
2: 2018-08-12 23:30:00         51
3: 2018-08-12 23:45:00         51
4: 2018-08-13 00:00:00         51
5: 2018-08-13 00:15:00         55
6: 2018-08-13 00:30:00         57
7: 2018-08-13 00:45:00         60
8: 2018-08-13 01:00:00         61
9: 2018-08-13 01:15:00         62
df <-
structure(list(Timestamp = c("8/12/2018 23:00:00", "8/13/2018 0:00:00", 
"8/13/2018 0:10:00", "8/13/2018 0:14:00", "8/13/2018 0:15:00", 
"8/13/2018 0:19:00", "8/13/2018 0:29:00", "8/13/2018 0:38:00", 
"8/13/2018 0:44:00", "8/13/2018 0:45:00", "8/13/2018 0:58:00", 
"8/13/2018 1:01:00"), Value..kW. = 51:62), .Names = c("Timestamp", 
"Value..kW."), class = "data.frame", row.names = c(NA, -12L))
df0 <- data.frame(
readr::read_table("        Timestamp   Value.(kW)
8/12/2018 23:00:06  51
8/13/2018 0:00:16   52
8/13/2018 0:10:26   53
8/13/2018 0:14:36   54
8/13/2018 0:15:00   55
8/13/2018 0:19:57   56
8/13/2018 0:29:09   57
8/13/2018 0:38:17   58
8/13/2018 0:44:59   59
8/13/2018 0:45:00   60
8/13/2018 0:58:47   61
8/13/2018 1:01:57   62
"))
# prepare
library(lubridate)
library(data.table)
setDT(df0)[, Timestamp := mdy_hms(Timestamp)]