R累计总和,但顺序相反
我有一个data.table,我想在其中添加一个新列,该列的累计和为R累计总和,但顺序相反,r,data.table,R,Data.table,我有一个data.table,我想在其中添加一个新列,该列的累计和为varcolumn,但顺序相反 structure(list(date = c("2020-09-18", "2020-09-25", "2020-09-30", "2020-10-02", "2020-10-09", "2020-10-16", "2020-10-23", "2
var
column,但顺序相反
structure(list(date = c("2020-09-18", "2020-09-25", "2020-09-30",
"2020-10-02", "2020-10-09", "2020-10-16", "2020-10-23", "2020-10-30",
"2020-11-20", "2020-12-31", "2021-01-15", "2021-03-19", "2021-03-31",
"2021-04-16", "2021-06-30", "2022-01-21", "2022-06-17", "2023-01-20"
), var = c(641202L, 85464L, 868557L, 46256L, 13760L, 1034287L,
6473L, 9769L, 653072L, 273695L, 1927442L, 455322L, 67728L, 12948L,
184244L, 401747L, 70496L, 1235L)), row.names = c(NA, -18L), groups = structure(list(
ExpDate = c("2020-09-18", "2020-09-25", "2020-09-30", "2020-10-02",
"2020-10-09", "2020-10-16", "2020-10-23", "2020-10-30", "2020-11-20",
"2020-12-31", "2021-01-15", "2021-03-19", "2021-03-31", "2021-04-16",
"2021-06-30", "2022-01-21", "2022-06-17", "2023-01-20"),
.rows = structure(list(1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14,
15:16, 17:18, 19:20, 21:22, 23:24, 25:26, 27:28, 29:30,
31:32, 33:34, 35:36), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, 18L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x5580c995ccb0>)
我想添加一个新列,该列将从下到上累加var
列中的值
date var reverse_sum
1: 2020-09-18 641202
2: 2020-09-25 85464
3: 2020-09-30 868557
4: 2020-10-02 46256
5: 2020-10-09 13760
6: 2020-10-16 1034287
7: 2020-10-23 6473
8: 2020-10-30 9769
9: 2020-11-20 653072
10: 2020-12-31 273695
11: 2021-01-15 1927442
12: 2021-03-19 455322
13: 2021-03-31 67728
14: 2021-04-16 12948
15: 2021-06-30 184244
16: 2022-01-21 401747 (71731 + 401747) = 473478 (and so on upwards)
17: 2022-06-17 70496 (70496 + 1235) = 71731 (only the sum will be shown in this column)
18: 2023-01-20 1235 1235
我确信使用data.table一定有一个简单的单行解决方案
谢谢,
Saurabh您可以简单地使用
rev
和cumsum
(然后再次使用rev
):
dat[,反向求和:=rev(累积值)(rev(var))]
dat
#日期变量逆和
# 1: 2020-09-18 641202 6753697
# 2: 2020-09-25 85464 6112495
# 3: 2020-09-30 868557 6027031
# 4: 2020-10-02 46256 5158474
# 5: 2020-10-09 13760 5112218
# 6: 2020-10-16 1034287 5098458
# 7: 2020-10-23 6473 4064171
# 8: 2020-10-30 9769 4057698
# 9: 2020-11-20 653072 4047929
# 10: 2020-12-31 273695 3394857
# 11: 2021-01-15 1927442 3121162
# 12: 2021-03-19 455322 1193720
# 13: 2021-03-31 67728 738398
# 14: 2021-04-16 12948 670670
# 15: 2021-06-30 184244 657722
# 16: 2022-01-21 401747 473478
# 17: 2022-06-17 70496 71731
# 18: 2023-01-20 1235 1235
您可以使用data.table的第一个参数来决定行的操作顺序
dt[order(-date), reverse_sum := cumsum(var)]
我喜欢这种方法,它确保了秩序(我假设这是一个条件假设)。如果已经确保了排序,那么这种开销(执行时间的3倍,尽可能快)可能与较大的数据有关。
dt[order(-date), reverse_sum := cumsum(var)]