R 根据子组在季度时间序列中填写缺失的日期和时间

R 根据子组在季度时间序列中填写缺失的日期和时间,r,date,data.table,R,Date,Data.table,我有以下类型的数据,只是比这个大得多 DIST TALUK HOBLI CODE DATE REC_TIME RAIN DK P1 A1 1503 01-06-19 00:00:00 22.5 DK P1 A1 1503 01-06-19 00:15:00 23.0 DK P1 A1 1503 01-06-19 00:30:00 23.0 DK P1 A1 1503 01-06-19 00:

我有以下类型的数据,只是比这个大得多

DIST TALUK HOBLI CODE DATE      REC_TIME    RAIN
DK  P1  A1  1503    01-06-19    00:00:00    22.5
DK  P1  A1  1503    01-06-19    00:15:00    23.0
DK  P1  A1  1503    01-06-19    00:30:00    23.0
DK  P1  A1  1503    01-06-19    00:45:00    23.0
DK  P1  A1  1503    01-06-19    01:00:00    23.0
DK  P1  A1  1503    01-06-19    01:15:00    23.0
DK  P1  A1  1503    01-06-19    01:30:00    23.0
DK  P1  A1  1503    01-06-19    01:45:00    23.0
DK  P1  A1  1503    01-06-19    02:00:00    23.0
DK  P1  A2  515     01-06-19    22:15:00    23.0
DK  P1  A2  515     01-06-19    22:30:00    23.0
DK  P1  A2  515     01-06-19    22:45:00    23.0
DK  P1  A2  515     01-06-19    23:00:00    23.0
DK  P2  A3  633     01-07-19    22:15:00    23.0
DK  P2  A3  633     01-07-19    22:30:00    24.0
DK  P2  A3  633     01-07-19    22:45:00    24.0
DK  P2  A3  633     01-07-19    23:00:00    24.0
DK  P2  A3  633     01-07-19    23:15:00    24.0
DK  P2  A3  633     01-07-19    23:30:00    29.0
DK  P2  A3  633     01-07-19    23:45:00    32.0
DK  P2  A3  633     02-07-19    00:00:00    36.0
DK  P2  A3  633     02-07-19    00:15:00    36.0
DK  P3  B1  845     01-06-19    05:30:00    36.0
DK  P3  B1  845     01-06-19    05:45:00    36.0
DK  P3  B1  845     01-06-19    06:00:00    36.0
DK  P3  B1  845     01-06-19    06:15:00    36.0
DK  P3  B1  845     01-06-19    06:30:00    36.0
DK  P3  B1  845     01-06-19    06:45:00    36.0
DK  P3  B1  845     01-06-19    07:00:00    36.0
DK  P3  B1  845     01-06-19    07:15:00    36.0
DK  P3  B2  789     01-06-19    07:30:00    36.0
DK  P3  B2  789     01-06-19    07:45:00    36.0
DK  P3  B2  789     01-06-19    08:00:00    36.0
DK  P3  B2  789     01-06-19    08:15:00    36.0
DK  P3  B2  789     01-06-19    08:30:00    36.0
DK  P3  B2  789     01-06-19    08:45:00    0.0
DK  P3  B2  789     01-06-19    09:00:00    0.0
DK  P3  B2  789     01-06-19    09:15:00    0.0
DK  P3  B2  789     01-06-19    09:30:00    0.0
DK  P4  B4  801     22-08-19    00:00:00    0.0
DK  P4  B4  801     22-08-19    00:15:00    0.0
DK  P4  B4  801     22-08-19    00:30:00    0.5
DK  P4  B4  801     22-08-19    00:45:00    0.5
DK  P4  B4  801     22-08-19    22:30:00    0.5
DK  P4  B4  801     22-08-19    22:45:00    0.5
DK  P4  B4  801     30-11-19    21:45:00    0.5
DK  P4  B4  801     30-11-19    22:00:00    0.5
DK  P4  B4  801     30-11-19    22:15:00    0.5
DK  P4  B4  801     30-11-19    22:30:00    2.0
DK  P4  B4  801     30-11-19    22:45:00    5.5
DK  P4  B4  801     30-11-19    23:00:00    5.5
DK  P4  B4  801     30-11-19    23:15:00    5.5
DK  P4  B4  801     30-11-19    23:30:00    5.5
DK  P4  B4  801     30-11-19    23:45:00    5.5
数据从
01-06-19
(01-Jun-19)到
30-11-19
(19-11-30)开始,每小时有四次读数,但对于某些台站,此序列中的某些天和时间的观测值缺失。我想填写那些缺失的日期和记录时间,以便每个观测站都有从19年6月1日到19年11月30日的观测结果。此类日期和记录时间的可变降雨量应填充NA

我尝试了stack overflow中人们建议的几个选项,但没有得到想要的结果。 我还尝试了以下方法:

df_1 <- df[, .(RECORDED_DATE = seq(as.Date(min(df$RECORDED_DATE)), as.Date(max(df$RECORDED_DATE)), "day")), by = list(DISTRICT, TALUKNAME, HOBLINAME, TRGCODE, HOUR)]   
我还尝试了
tidyverse
,完成了,但没有得到预期的结果,因为数据帧中有错误。数据以日期为字符,在使用
tidyverse
或将其转换为DOUBLE后,不会进行合并。我尝试将字符转换为数字,但结果是日期列中填充了NA。
任何帮助都将不胜感激。

使用
dplyr
tidyr
,我们可以将日期和时间列与
unite
组合,然后从
min
max
DATETIME
创建一个每隔15分钟的序列,并在单独的列中获取日期和时间

library(dplyr)
library(tidyr)

df %>%
  unite(DATETIME, DATE, REC_TIME, sep = " ", remove = FALSE) %>%
  mutate(DATETIME = as.POSIXct(DATETIME, format = "%d-%m-%y %T", tz = "UTC")) %>%
  complete(CODE, DATETIME = seq(min(DATETIME), max(DATETIME), by = "15 min")) %>%
  mutate(DATE = as.Date(DATETIME), REC_TIME = format(DATETIME, "%T")) %>%
  select(-DATETIME) %>%
  group_by(CODE) %>%
  fill(DIST, TALUK, HOBLI, .direction = "updown")
数据

df <- structure(list(DIST = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "DK", class = "factor"), 
TALUK = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("P1", 
"P2", "P3", "P4"), class = "factor"), HOBLI = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("A1", "A2", "A3", 
"B1", "B2", "B4"), class = "factor"), CODE = c(1503L, 1503L, 
1503L, 1503L, 1503L, 1503L, 1503L, 1503L, 1503L, 515L, 515L, 
515L, 515L, 633L, 633L, 633L, 633L, 633L, 633L, 633L, 633L, 
633L, 845L, 845L, 845L, 845L, 845L, 845L, 845L, 845L, 789L, 
789L, 789L, 789L, 789L, 789L, 789L, 789L, 789L, 801L, 801L, 
801L, 801L, 801L, 801L, 801L, 801L, 801L, 801L, 801L, 801L, 
801L, 801L, 801L), DATE = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L), .Label = c("01-06-19", "01-07-19", "02-07-19", 
"22-08-19", "30-11-19"), class = "factor"), REC_TIME = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 29L, 30L, 31L, 32L, 29L, 
30L, 31L, 32L, 33L, 34L, 35L, 1L, 2L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 
26L, 1L, 2L, 3L, 4L, 30L, 31L, 27L, 28L, 29L, 30L, 31L, 32L, 
33L, 34L, 35L), .Label = c("00:00:00", "00:15:00", "00:30:00", 
"00:45:00", "01:00:00", "01:15:00", "01:30:00", "01:45:00", 
"02:00:00", "05:30:00", "05:45:00", "06:00:00", "06:15:00", 
"06:30:00", "06:45:00", "07:00:00", "07:15:00", "07:30:00", 
"07:45:00", "08:00:00", "08:15:00", "08:30:00", "08:45:00", 
"09:00:00", "09:15:00", "09:30:00", "21:45:00", "22:00:00", 
"22:15:00", "22:30:00", "22:45:00", "23:00:00", "23:15:00", 
"23:30:00", "23:45:00"), class = "factor"), RAIN = c(22.5, 
23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 
24, 24, 29, 32, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 
36, 36, 36, 36, 0, 0, 0, 0, 0, 0, 0.5, 0.5, 0.5, 0.5, 0.5, 
0.5, 0.5, 2, 5.5, 5.5, 5.5, 5.5, 5.5)), class = "data.frame", row.names = c(NA, -54L))

df如果您的数据集很大,使用
数据可能会更快。表

ans <- DT[CJ(CODE, dt=seq(min(dt), max(dt), by="15 mins"), unique=TRUE), 
    on=.(CODE, dt), roll="nearest"]

ans[DateTime!=dt, `:=` (
    .(DATE=format(dt, format="%d-%m-%y"), 
        REC_TIME=format(dt, format="%H:%M:%S"), 
        RAIN=NA_real_)
    )][,
        DateTime := NULL]

ans它给出的错误是:seq.int(0,to0-from,by)中的错误:'to'必须是一个有限的数字,而且我们可以为缺少的日期指定相应的名称和电台代码而不是NA。@Ajay哪一列是您的电台代码?如果是
DIST
则将
complete
行更改为
complete(DIST,DATETIME=seq(min(DATETIME),max(DATETIME),by=“15 min”)
。要填充相应的S值,您可以在末尾添加
%%>%fill(everything())
,以填充这些值。上述数据集中的“站点代码”列由“代码”列表示。我仍然在seq.int(0,to0-from,by)中得到错误作为错误:'to'必须是一个有限数。我是否需要写seq(as.Date(min(DATETIME))),但DATETIME不是日期对象,所以我不能这样写。@我已经更新了我正在使用的数据。你能用这些数据检查一下你是否得到了答案吗?根据你的数据集,它将NA分配给RAIN列
ans <- DT[CJ(CODE, dt=seq(min(dt), max(dt), by="15 mins"), unique=TRUE), 
    on=.(CODE, dt), roll="nearest"]

ans[DateTime!=dt, `:=` (
    .(DATE=format(dt, format="%d-%m-%y"), 
        REC_TIME=format(dt, format="%H:%M:%S"), 
        RAIN=NA_real_)
    )][,
        DateTime := NULL]
library(data.table)
DT <- fread("DIST TALUK HOBLI CODE DATE      REC_TIME    RAIN
DK  P1  A1  1503    01-06-19    00:00:00    22.5
DK  P1  A1  1503    01-06-19    00:15:00    23.0
DK  P1  A1  1503    01-06-19    00:30:00    23.0
DK  P1  A1  1503    01-06-19    00:45:00    23.0
DK  P1  A1  1503    01-06-19    01:00:00    23.0
DK  P1  A1  1503    01-06-19    01:15:00    23.0
DK  P1  A1  1503    01-06-19    01:30:00    23.0
DK  P1  A1  1503    01-06-19    01:45:00    23.0
DK  P1  A1  1503    01-06-19    02:00:00    23.0
DK  P1  A2  515     01-06-19    22:15:00    23.0
DK  P1  A2  515     01-06-19    22:30:00    23.0
DK  P1  A2  515     01-06-19    22:45:00    23.0
DK  P1  A2  515     01-06-19    23:00:00    23.0
DK  P2  A3  633     01-07-19    22:15:00    23.0
DK  P2  A3  633     01-07-19    22:30:00    24.0
DK  P2  A3  633     01-07-19    22:45:00    24.0
DK  P2  A3  633     01-07-19    23:00:00    24.0
DK  P2  A3  633     01-07-19    23:15:00    24.0
DK  P2  A3  633     01-07-19    23:30:00    29.0
DK  P2  A3  633     01-07-19    23:45:00    32.0
DK  P2  A3  633     02-07-19    00:00:00    36.0
DK  P2  A3  633     02-07-19    00:15:00    36.0
DK  P3  B1  845     01-06-19    05:30:00    36.0
DK  P3  B1  845     01-06-19    05:45:00    36.0
DK  P3  B1  845     01-06-19    06:00:00    36.0
DK  P3  B1  845     01-06-19    06:15:00    36.0
DK  P3  B1  845     01-06-19    06:30:00    36.0
DK  P3  B1  845     01-06-19    06:45:00    36.0
DK  P3  B1  845     01-06-19    07:00:00    36.0
DK  P3  B1  845     01-06-19    07:15:00    36.0
DK  P3  B2  789     01-06-19    07:30:00    36.0
DK  P3  B2  789     01-06-19    07:45:00    36.0
DK  P3  B2  789     01-06-19    08:00:00    36.0
DK  P3  B2  789     01-06-19    08:15:00    36.0
DK  P3  B2  789     01-06-19    08:30:00    36.0
DK  P3  B2  789     01-06-19    08:45:00    0.0
DK  P3  B2  789     01-06-19    09:00:00    0.0
DK  P3  B2  789     01-06-19    09:15:00    0.0
DK  P3  B2  789     01-06-19    09:30:00    0.0
DK  P4  B4  801     22-08-19    00:00:00    0.0
DK  P4  B4  801     22-08-19    00:15:00    0.0
DK  P4  B4  801     22-08-19    00:30:00    0.5
DK  P4  B4  801     22-08-19    00:45:00    0.5
DK  P4  B4  801     22-08-19    22:30:00    0.5
DK  P4  B4  801     22-08-19    22:45:00    0.5
DK  P4  B4  801     30-11-19    21:45:00    0.5
DK  P4  B4  801     30-11-19    22:00:00    0.5
DK  P4  B4  801     30-11-19    22:15:00    0.5
DK  P4  B4  801     30-11-19    22:30:00    2.0
DK  P4  B4  801     30-11-19    22:45:00    5.5
DK  P4  B4  801     30-11-19    23:00:00    5.5
DK  P4  B4  801     30-11-19    23:15:00    5.5
DK  P4  B4  801     30-11-19    23:30:00    5.5
DK  P4  B4  801     30-11-19    23:45:00    5.5")
DT[, dt := as.POSIXct(paste0(DATE, REC_TIME), format="%d-%m-%y %H:%M:%S")][,
    DateTime := dt]