R 如果<;0然后使用0
我在两件事上遇到了问题,但条件求和的问题更重要(我想不那么琐碎)。我想从每一行的总和中减去1,如果结果是,一种快速而肮脏的方法是将数据分组:R 如果<;0然后使用0,r,data.table,R,Data.table,我在两件事上遇到了问题,但条件求和的问题更重要(我想不那么琐碎)。我想从每一行的总和中减去1,如果结果是,一种快速而肮脏的方法是将数据分组: a %>% mutate( lag_disp = lag(a$DISPENSED_DURATION, default=-1), change = DISPENSED_DURATION != lag_disp, group = cumsum(as.integer(DISPENSED_DURATION & chang
a %>%
mutate(
lag_disp = lag(a$DISPENSED_DURATION, default=-1),
change = DISPENSED_DURATION != lag_disp,
group = cumsum(as.integer(DISPENSED_DURATION & change))
) %>%
group_by(group) %>%
mutate(DISP_INVT = ifelse(
group == 1,
pmax(1+cumsum(DISPENSED_DURATION-1), 0),
pmax(cumsum(DISPENSED_DURATION-1), 0)
)) %>%
ungroup() %>%
select(-c(lag_disp, change, group))
# id1 id2 date DISPENSED_DURATION DISP_INVT
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 1 1 2020-01-01 4 4
# 2 1 2 2020-01-02 0 3
# 3 1 3 2020-01-03 0 2
# 4 1 4 2020-01-04 0 1
# 5 1 5 2020-01-05 0 0
# 6 1 6 2020-01-06 0 0
# 7 1 7 2020-01-07 0 0
# 8 1 8 2020-01-08 4 3
# 9 1 9 2020-01-09 0 2
# 10 1 10 2020-01-10 0 1
a%>%
变异(
lag_disp=lag(一个$DISPENSED_持续时间,默认值=-1),
更改=分配的持续时间!=延迟分配,
组=总和(作为整数(分配的持续时间和变化))
) %>%
分组依据(分组)%>%
变异(DISP_INVT=ifelse(
组==1,
pmax(1+cumsum(分配持续时间-1),0),
pmax(累计金额(分配持续时间-1),0)
)) %>%
解组()%>%
选择(-c(滞后显示、更改、组))
#id1 id2分配日期\u持续时间DISP\u INVT
#
# 1 1 1 2020-01-01 4 4
# 2 1 2 2020-01-02 0 3
# 3 1 3 2020-01-03 0 2
# 4 1 4 2020-01-04 0 1
# 5 1 5 2020-01-05 0 0
# 6 1 6 2020-01-06 0 0
# 7 1 7 2020-01-07 0 0
# 8 1 8 2020-01-08 4 3
# 9 1 9 2020-01-09 0 2
# 10 1 10 2020-01-10 0 1
我确信存在一种更简洁的方法,但这应该是可行的。这里需要的是减少/迭代计算,而不是累积/向量,因为一行中的值取决于前一行的计算值
a[,DISP\u INVT:=Reduce(函数(prev,this)max(0,prev+this-1),
分配(持续时间,累计=真)]
A.
#id1 id2分配日期\u持续时间DISP\u INVT
#
# 1: 1 1 2020-01-01 4 4
# 2: 1 2 2020-01-02 0 3
# 3: 1 3 2020-01-03 0 2
# 4: 1 4 2020-01-04 0 1
# 5: 1 5 2020-01-05 0 0
# 6: 1 6 2020-01-06 0 0
# 7: 1 7 2020-01-07 0 0
# 8: 1 8 2020-01-08 4 3
# 9: 1 9 2020-01-09 0 2
# 10: 1 10 2020-01-10 0 1
谢谢你,威廉,这是可行的,不过我会选择@r2evans的答案,因为它使用data.table,但我已经对你的答案投了赞成票。非常感谢你。
a[, DISP_INVT := cumsum(DISPENSED_DURATION-1), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 2
3: 1 3 2020-01-03 0 1
4: 1 4 2020-01-04 0 0
5: 1 5 2020-01-05 0 -1
6: 1 6 2020-01-06 0 -2
7: 1 7 2020-01-07 0 -3
8: 1 8 2020-01-08 4 0
9: 1 9 2020-01-09 0 -1
10: 1 10 2020-01-10 0 -2
a[, DISP_INVT := ifelse(cumsum(DISPENSED_DURATION-1)<0,0,cumsum(DISPENSED_DURATION-1)), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 2
3: 1 3 2020-01-03 0 1
4: 1 4 2020-01-04 0 0
5: 1 5 2020-01-05 0 0
6: 1 6 2020-01-06 0 0
7: 1 7 2020-01-07 0 0
8: 1 8 2020-01-08 4 0
9: 1 9 2020-01-09 0 0
10: 1 10 2020-01-10 0 0
a[, DISP_INVT := cumsum(ifelse(DISPENSED_DURATION-1<0,0,DISPENSED_DURATION-1)), id1]
id1 id2 date DISPENSED_DURATION DISP_INVT
1: 1 1 2020-01-01 4 3
2: 1 2 2020-01-02 0 3
3: 1 3 2020-01-03 0 3
4: 1 4 2020-01-04 0 3
5: 1 5 2020-01-05 0 3
6: 1 6 2020-01-06 0 3
7: 1 7 2020-01-07 0 3
8: 1 8 2020-01-08 4 6
9: 1 9 2020-01-09 0 6
10: 1 10 2020-01-10 0 6
a %>%
mutate(
lag_disp = lag(a$DISPENSED_DURATION, default=-1),
change = DISPENSED_DURATION != lag_disp,
group = cumsum(as.integer(DISPENSED_DURATION & change))
) %>%
group_by(group) %>%
mutate(DISP_INVT = ifelse(
group == 1,
pmax(1+cumsum(DISPENSED_DURATION-1), 0),
pmax(cumsum(DISPENSED_DURATION-1), 0)
)) %>%
ungroup() %>%
select(-c(lag_disp, change, group))
# id1 id2 date DISPENSED_DURATION DISP_INVT
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 1 1 2020-01-01 4 4
# 2 1 2 2020-01-02 0 3
# 3 1 3 2020-01-03 0 2
# 4 1 4 2020-01-04 0 1
# 5 1 5 2020-01-05 0 0
# 6 1 6 2020-01-06 0 0
# 7 1 7 2020-01-07 0 0
# 8 1 8 2020-01-08 4 3
# 9 1 9 2020-01-09 0 2
# 10 1 10 2020-01-10 0 1