R 在data.table中查找以前较大的引用

R 在data.table中查找以前较大的引用,r,data.table,R,Data.table,我有一个很大的数据文件,里面有不同的日期和数量的参考资料。每一行都是一个事务,带有日期和数量。我需要找出低于阈值的交易之前是否有较大的交易(数量方面)。我已经做到了这一点,但是我想不出一个更简单的方法,我确信它是存在的。谢谢你给我任何提示。下面是一个完全可复制的示例: # load required package require(data.table) # make it fully reproducible set.seed(1) a <- data.table(ref = samp

我有一个很大的数据文件,里面有不同的日期和数量的参考资料。每一行都是一个事务,带有日期和数量。我需要找出低于阈值的交易之前是否有较大的交易(数量方面)。我已经做到了这一点,但是我想不出一个更简单的方法,我确信它是存在的。谢谢你给我任何提示。下面是一个完全可复制的示例:

# load required package
require(data.table)

# make it fully reproducible
set.seed(1)
a <- data.table(ref = sample(LETTERS[1:10], 300, TRUE), dates = sample(seq(as.Date("2017-08-01"), as.Date("2017-12-01"), "day"), 300, TRUE), qty = sample(1:500, 300, TRUE))

# Compute some intermediate tables
#   First one has all records below the threshold (20) with their dates
temp1 <- a[, .(dates, qLess = qty < 20, qty), by = ref][qLess == TRUE,]

#   Second one has all records above threshold with minimum dates
temp2 <- a[, .(qGeq = qty >= 20, dates), by = ref][qGeq == TRUE,][, min(dates), by = ref]

# Join both tables on ref, filter those below the threshold and filter the ones that are actually preceded (prec) by a larger order. THIS IS THE EXPECTED RESULT
temp1[temp2, on = "ref"][, prec := V1 < dates][qLess == TRUE,][prec == TRUE,]
#加载所需的包
要求(数据表)
#使其完全可复制
种子(1)

这很简单。我们将键设置为按ref和日期排序,然后用
1
标记“大”订单,对于小订单,将前面的大订单设置为
NA
,对于大订单,将日期设置为NA,然后向前填充大订单日期。对于每个订单,结果都有最近的大订单,如果前面没有大订单,则缺少值

setkey(a, ref, dates)
a[, is_big := (qty >= 20) + 0L]
a[is_big == 1, preceding_big_date := dates]
a[, preceding_big_date := zoo::na.locf(preceding_big_date), by = ref]
new_result = a[is_big == 0, ]

仅使用
数据的非等联接可能性的另一种方法。表

setorder(a, ref, dates)
a[qty < 20][a[qty >= 20]
            , on = .(ref, dates > dates)
            , prev.big.date := i.dates, by = .EACHI][]
    ref      dates qty prev.big.date
 1:   A 2017-09-16   5    2017-09-12
 2:   A 2017-09-27  16    2017-09-19
 3:   B 2017-09-17  19    2017-09-16
 4:   B 2017-09-30  19    2017-09-28
 5:   B 2017-10-04   6    2017-10-01
 6:   C 2017-08-14   6    2017-08-12
 7:   C 2017-10-08   1    2017-10-01
 8:   C 2017-10-24  18    2017-10-22
 9:   D 2017-10-20   7    2017-10-18
10:   F 2017-10-20  11    2017-10-11
11:   F 2017-11-23  18    2017-11-22
12:   G 2017-11-15  15    2017-11-12
13:   H 2017-09-30  14    2017-09-28
14:   H 2017-10-05  16    2017-09-28
15:   H 2017-10-29  18    2017-10-26
16:   I 2017-10-27   9    2017-10-19
17:   J 2017-09-23   3    2017-09-17