Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何使用data.table计算日内数据每天的滚动分位数_R_Data.table_Quantile_Rollapply - Fatal编程技术网

R 如何使用data.table计算日内数据每天的滚动分位数

R 如何使用data.table计算日内数据每天的滚动分位数,r,data.table,quantile,rollapply,R,Data.table,Quantile,Rollapply,我想用数据表计算一个滚动分位数,它包含了几个组的数据,对于每个组,我有多天的时间,在每一天内我有多个观察。我不想计算表中每一个观察值的滚动分位数,但总是获取最后,比如说3天的数据,计算一个分位数,然后继续 我有这样的数据: test2 <- data.table(group = rep(c("a", "b"), each = 100), date = rep(rep(seq(from = as.Date('2017-01-01'),

我想用数据表计算一个滚动分位数,它包含了几个组的数据,对于每个组,我有多天的时间,在每一天内我有多个观察。我不想计算表中每一个观察值的滚动分位数,但总是获取最后,比如说3天的数据,计算一个分位数,然后继续

我有这样的数据:

test2 <- data.table(group = rep(c("a", "b"), each = 100),
                    date = rep(rep(seq(from = as.Date('2017-01-01'),
                                    as.Date('2017-01-10'),
                                    by = "day"), each = 10), 2),
                    time = rep(rep(seq(from = 1, 10, by = 1), times = 10), 2),
                    some_data = rnorm(200) + c(1:20, 20:1, 30:1, 1:30, 30:1, 1:20, 20:1, 1:30))
tests_result <- test2[, list(date = date,
                         q_30 = rollapply(some_data,
                                          30, quantile,
                                          probs = 0.3,
                                          fill = NA, align = "right")),
                  by = "group"][seq(from = 10, to = 200, by = 10)]
    group       date      q_30
 1:     a 2017-01-01        NA
 2:     a 2017-01-02        NA
 3:     a 2017-01-03 10.284081
 4:     a 2017-01-04  8.281827
 5:     a 2017-01-05  8.281827
 6:     a 2017-01-06  8.281827
 7:     a 2017-01-07 10.274793
 8:     a 2017-01-08  4.749455
 9:     a 2017-01-09  4.749455
10:     a 2017-01-10  9.246267
11:     b 2017-01-01        NA
12:     b 2017-01-02        NA
13:     b 2017-01-03 10.145996
14:     b 2017-01-04  5.423782
15:     b 2017-01-05  5.423782
16:     b 2017-01-06  9.741683
17:     b 2017-01-07 10.123940
18:     b 2017-01-08  4.347293
19:     b 2017-01-09  4.347293
20:     b 2017-01-10  9.177718
总结挑战:

test2 <- data.table(group = rep(c("a", "b"), each = 100),
                    date = rep(rep(seq(from = as.Date('2017-01-01'),
                                    as.Date('2017-01-10'),
                                    by = "day"), each = 10), 2),
                    time = rep(rep(seq(from = 1, 10, by = 1), times = 10), 2),
                    some_data = rnorm(200) + c(1:20, 20:1, 30:1, 1:30, 30:1, 1:20, 20:1, 1:30))
tests_result <- test2[, list(date = date,
                         q_30 = rollapply(some_data,
                                          30, quantile,
                                          probs = 0.3,
                                          fill = NA, align = "right")),
                  by = "group"][seq(from = 10, to = 200, by = 10)]
    group       date      q_30
 1:     a 2017-01-01        NA
 2:     a 2017-01-02        NA
 3:     a 2017-01-03 10.284081
 4:     a 2017-01-04  8.281827
 5:     a 2017-01-05  8.281827
 6:     a 2017-01-06  8.281827
 7:     a 2017-01-07 10.274793
 8:     a 2017-01-08  4.749455
 9:     a 2017-01-09  4.749455
10:     a 2017-01-10  9.246267
11:     b 2017-01-01        NA
12:     b 2017-01-02        NA
13:     b 2017-01-03 10.145996
14:     b 2017-01-04  5.423782
15:     b 2017-01-05  5.423782
16:     b 2017-01-06  9.741683
17:     b 2017-01-07 10.123940
18:     b 2017-01-08  4.347293
19:     b 2017-01-09  4.347293
20:     b 2017-01-10  9.177718
  • 每天只计算一次分位数,而不是10分位数 时代
  • 在给定天数内执行分位数计算 即使每天观察的次数不同。 i、 如果我想计算分位数,基于2天,第一天 将有10个值,第二天20个值,我会得到一个基于结果的结果 这两天的30个数值以及最终结果 计算将分配到第二天的日期
  • 编辑:

    test2 <- data.table(group = rep(c("a", "b"), each = 100),
                        date = rep(rep(seq(from = as.Date('2017-01-01'),
                                        as.Date('2017-01-10'),
                                        by = "day"), each = 10), 2),
                        time = rep(rep(seq(from = 1, 10, by = 1), times = 10), 2),
                        some_data = rnorm(200) + c(1:20, 20:1, 30:1, 1:30, 30:1, 1:20, 20:1, 1:30))
    
    tests_result <- test2[, list(date = date,
                             q_30 = rollapply(some_data,
                                              30, quantile,
                                              probs = 0.3,
                                              fill = NA, align = "right")),
                      by = "group"][seq(from = 10, to = 200, by = 10)]
    
        group       date      q_30
     1:     a 2017-01-01        NA
     2:     a 2017-01-02        NA
     3:     a 2017-01-03 10.284081
     4:     a 2017-01-04  8.281827
     5:     a 2017-01-05  8.281827
     6:     a 2017-01-06  8.281827
     7:     a 2017-01-07 10.274793
     8:     a 2017-01-08  4.749455
     9:     a 2017-01-09  4.749455
    10:     a 2017-01-10  9.246267
    11:     b 2017-01-01        NA
    12:     b 2017-01-02        NA
    13:     b 2017-01-03 10.145996
    14:     b 2017-01-04  5.423782
    15:     b 2017-01-05  5.423782
    16:     b 2017-01-06  9.741683
    17:     b 2017-01-07 10.123940
    18:     b 2017-01-08  4.347293
    19:     b 2017-01-09  4.347293
    20:     b 2017-01-10  9.177718
    
    我想出了一种方法来处理数据集的大小。但是我认为它仍然可以改进,所以,如果你有任何建议,我想听听

    我对样本数据集的处理方法如下所示:

    首先计算随后每3天的观察总数,同时计算给定一天中最后一次观察的原始数据集中的行数。这些新变量将在第3行和原始行中被称为

    test3 <- test2[, list(.N, orig_row = .I[.N]), by = c("group", "date")][, list(date,in_3 = rollapply(N, 3, sum, fill = NA, align = "right"),
                                                      orig_row),
                                               by = "group"]
    
    最后,分配给聚合数据集

    test3[, `:=`(q03 = quantiles)]
    

    我也试着并行运行,但后来我的笔记本电脑的物理内存用完了,开始向磁盘写入太多内容,这比仅用一个内核运行要慢得多。

    您的预期输出是什么?对于特定天数的滚动分位数是什么意思?这不是你的“理论”代码所做的。这就是为什么在代码之后我说
    ,然后每天做最后一次观察。我希望数据集中的每个日期都有一个数字,这将是根据当天的观察值计算出的分位数,+前几天的观察值。啊,你是对的,我提供的代码作为示例,我想要的,实际上并不完全符合我的想法。调整很快就会到来moment@mtoto我的不好,只是现在预期的结果是正确的。