带条件R的计数/重复_R_Dataframe_Dplyr

带条件R的计数/重复

r dataframe

带条件R的计数/重复,r,dataframe,dplyr,R,Dataframe,Dplyr,根据我之前提出的一个问题（），我有下表： Week SKU Discount(%) Duration LastDiscount 1 111 5 2 0 2 111 5 2 0 3 111 0 0 0 4 111 10

根据我之前提出的一个问题（），我有下表：

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           0
     5     111      11            2           2
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我希望

LastDiscount

计数位于第一行，在该行中，同一SKU在不同的周内有不同的折扣。例如，SKU 111在第2周有折扣，下一次折扣在第4周，这是自上次折扣后的2周，但问题是我希望结果在第4周开始下一次折扣活动

大概是这样的：

  Week   SKU   Discount(%)   Duration  LastDiscount
     1     111       5            2           0
     2     111       5            2           0
     3     111       0            0           0
     4     111      10            2           2
     5     111      11            2           0
     1     222       0            0           0
     2     222      10            3           0
     3     222      15            3           0
     4     222      20            3           0

我现在有这个代码：

df1 %>%
  group_by(SKU) %>% 
  mutate(Duration = with(rle(Discount > 0), rep(lengths*values, 
        lengths)),
         temp = with(rle(Discount > 0), sum(values != 0)), 
         LastDiscount = if(temp[1] > 1) c(rep(0, n()-1), temp[1]) else 0) %>%
  select(-temp)

最后的折扣总是比它应该的位置低一行吗？如果是，您可以这样做：

library(dplyr)
df %>% 
  mutate(LastDiscount2=lead(LastDiscount))

这里有一个使用

数据表的选项。如果OP仅在寻找dplyr
解决方案，我将删除它：
#calculate duration of discount and also the start and end of discount period
DT[, c("Duration", "disc_seq") := {
        dur <- sum(`Discount(%)` > 0L)
        disc_seq <- rep("", .N)
        if (dur > 0) {
            disc_seq[1L] <- "S"
            disc_seq[length(disc_seq)] <- "E"
        }
        .(dur, disc_seq)
    }, 
    .(SKU, rleid(`Discount(%)` > 0L))]
DT[]

#use a non-equi join to find the end of previous discount period to update LastDiscount column of the start of current discount period
DT[, LastDiscount := 0L]
DT[disc_seq=="S", LastDiscount := {
        ld <- DT[disc_seq=="E"][.SD, on=.(SKU, Week<Week), by=.EACHI, i.Week - x.Week]$V1
        replace(ld, is.na(ld), 0L)
    }]
DT[]

数据：
库（data.table）
DT我在折扣持续时间内收到以下消息：中出错。（SKU，rleid（折扣（%）L））：找不到函数“
，对于上次折扣
，此：中出错：=（entresc，0L）：检查是否为.data.table（DT）==TRUE。否则，：=和
：=（…）被定义为仅以特定方式在j中使用一次。请参阅帮助（“：=”。
当我更改DF时，我遇到以下错误：
[.data.table（DT，disc_-seq==“S”），
：=（LastDiscount，{：提供了10个项目，分配给“LastDiscount”列的9个项目。如果您希望“回收”RHS，请使用rep（）向代码读者说明此意图。
   Week SKU Discount(%) Duration disc_seq LastDiscount
1:    1 111           5        2        S            0
2:    2 111           5        2        E            0
3:    3 111           0        0                     0
4:    4 111          10        2        S            2
5:    5 111          11        2        E            0
6:    1 222           0        0                     0
7:    2 222          10        3        S            0
8:    3 222          15        3                     0
9:    4 222          20        3        E            0

library(data.table)
DT <- fread("Week   SKU   Discount(%)
1     111       5
2     111       5
3     111       0
4     111      10
5     111      11
1     222       0
2     222      10
3     222      15
4     222      20")