R 如何将当前行的负值转移到数据帧中的前一行?

R 如何将当前行的负值转移到数据帧中的前一行?,r,dataframe,dplyr,data.table,data-cleaning,R,Dataframe,Dplyr,Data.table,Data Cleaning,我想通过将当前行的负值添加到每个组中的前一行,将它们转移到前一行。 以下是我拥有的原始数据样本: raw_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6), YEARMO = rep(c(201801:201806),3), VALUE = c(100,-10,20,70,-50,30,20,60,40,-20,-10,50,0,10,-30,50,10

我想通过将当前行的负值添加到每个组中的前一行,将它们转移到前一行。 以下是我拥有的原始数据样本:

raw_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
                   YEARMO = rep(c(201801:201806),3),
                   VALUE = c(100,-10,20,70,-50,30,20,60,40,-20,-10,50,0,10,-30,50,100,-100))
> raw_data
  GROUP YEARMO VALUE
1      A 201801   100  
2      A 201802   -10
3      A 201803    20
4      A 201804    70
5      A 201805   -50
6      A 201806    30
7      B 201801    20
8      B 201802    60
9      B 201803    40
10     B 201804   -20
11     B 201805   -10
12     B 201806    50
13     C 201801     0
14     C 201802    10
15     C 201803   -30
16     C 201804    50
17     C 201805   100
18     C 201806  -100
原始数据原始数据
组值
1A 20180100
2 A 201802-10
3 A 201803 20
4 A 201804 70
5 A 201805-50
6 A 201806 30
7 B 201801 20
8 B 201802 60
9 B 201803 40
10 B 201804-20
11 B 201805-10
12 B 201806 50
13 C 201801 0
14 C 201802 10
15 C 201803-30
16 C 201804 50
17 C 201805 100
18 C 201806-100
以下是我想要的输出:

final_data <- data.frame(GROUP = rep(c('A','B','C'),each = 6),
                   YEARMO = rep(c(201801:201806),3),
                   VALUE = c(90,0,20,20,0,30,20,60,10,0,0,50,-20,0,0,50,0,0))
> final_data
   GROUP YEARMO VALUE
1      A 201801    90
2      A 201802     0
3      A 201803    20
4      A 201804    20
5      A 201805     0
6      A 201806    30
7      B 201801    20
8      B 201802    60
9      B 201803    10
10     B 201804     0
11     B 201805     0
12     B 201806    50
13     C 201801   -20
14     C 201802     0
15     C 201803     0
16     C 201804    50
17     C 201805     0
18     C 201806     0
final_数据final_数据
组值
1A 201801 90
2 A 201802 0
3 A 201803 20
4 A 201804 20
5 A 201805 0
6 A 201806 30
7 B 201801 20
8 B 201802 60
9 B 201803 10
10 B 201804 0
11 B 201805 0
12 B 201806 50
13 C 201801-20
14 C 201802 0
15 C 201803 0
16 C 201804 50
17 C 201805 0
18 C 201806 0
以下数据框将显示如何在每个组中进行转换:

Trans_GRP_A <- data.frame(GROUP = rep('A',each = 6),
                   YEARMO = c(201801:201806),
                   VALUE = c(100,-10,20,70,-50,30),
                   ITER_1 = c(100,-10,20,20,0,30),
                   ITER_2 = c(90,0,20,20,0,30))
> Trans_GRP_A
  GROUP YEARMO VALUE ITER_1 ITER_2
1     A 201801   100    100     90
2     A 201802   -10    -10      0
3     A 201803    20     20     20
4     A 201804    70     20     20
5     A 201805   -50      0      0
6     A 201806    30     30     30

> Trans_GRP_B <- data.frame(GROUP = rep('B',each = 6),
+                           YEARMO = c(201801:201806),
+                           VALUE = c(20,60,40,-20,-10,50),
+                           ITER_1 = c(20,60,40,-30,0,50),
+                           ITER_2 = c(20,60,10,0,0,50))
> Trans_GRP_B
  GROUP YEARMO VALUE ITER_1 ITER_2
1     B 201801    20     20     20
2     B 201802    60     60     60
3     B 201803    40     40     10
4     B 201804   -20    -30      0
5     B 201805   -10      0      0
6     B 201806    50     50     50

> Trans_GRP_C <- data.frame(GROUP = rep('C',each = 6),
+                           YEARMO = c(201801:201806),
+                           VALUE = c(0,10,-30,50,100,-100),
+                           ITER_1 = c(0,10,-30,50,0,0),
+                           ITER_2 = c(0,-20,0,50,0,0),
+                           ITER_3 = c(-20,0,0,50,0,0))
> Trans_GRP_C
  GROUP YEARMO VALUE ITER_1 ITER_2 ITER_3
1     C 201801     0      0      0    -20
2     C 201802    10     10    -20      0
3     C 201803   -30    -30      0      0
4     C 201804    50     50     50     50
5     C 201805   100      0      0      0
6     C 201806  -100      0      0      0
library(data.table)
DT <- as.data.table(raw_data)
DT$final <- final_data$VALUE
DT[, new := {
  x <- VALUE
  sn <- 0
  for (i in .N:1) {
    if (i > 1) {
      if (x[i] < 0) {
        sn <- sn + x[i]
        x[i] <- 0
      } else {
        tmp <- pmax(x[i] + sn, 0)
        sn <- sn + x[i] - tmp
        x[i] <- tmp
      }
    } else {
      x[i] <- x[i] + sn
    }
  }
  x
}, by = GROUP]
DT[]
Trans\u GRP\u A Trans\u GRP\u A
国际热核聚变1号国际热核聚变2号
1A 20180110090
2 A 201802-10-10 0
3 A 201803 20 20
4 A 201804 70 20 20
5 A 201805-50 0
6 A 201806 30 30 30
>Trans_GRP_B Trans_GRP_B
国际热核聚变1号国际热核聚变2号
1 B 201801 20 20
2 B 201802 60
3 B 201803 40 10
4 B 201804-20-30 0
5 B 201805-10 0 0
6 B 201806 50 50
>Trans_GRP_C Trans_GRP_C
国际热核聚变1号国际热核聚变2号国际热核聚变3号
1 C 201801 0-20
2 C 201802 10 10-20 0
3 C 201803-30-30 0
4 C 201804 50 50
5 C 2018051000
6 C 201806-100 0
传输逻辑如下所示:

  • 将负值替换为0
  • 将当前行的负值与前一行的值相加
  • 将负值转移到上一行,直到该值变为正值或0
  • 在组中遇到第一行之前进行传输,如果传输未产生正值,则每个组中的第一行为YEARMO=201801

  • 欢迎任何解决办法。我认为矢量化的解决方案可能执行得更快。

    这是一个棘手的问题。我试图找到一个矢量化的解决方案,但到目前为止唯一有效的方法是向后循环每个组内的行:

    Trans_GRP_A <- data.frame(GROUP = rep('A',each = 6),
                       YEARMO = c(201801:201806),
                       VALUE = c(100,-10,20,70,-50,30),
                       ITER_1 = c(100,-10,20,20,0,30),
                       ITER_2 = c(90,0,20,20,0,30))
    > Trans_GRP_A
      GROUP YEARMO VALUE ITER_1 ITER_2
    1     A 201801   100    100     90
    2     A 201802   -10    -10      0
    3     A 201803    20     20     20
    4     A 201804    70     20     20
    5     A 201805   -50      0      0
    6     A 201806    30     30     30
    
    > Trans_GRP_B <- data.frame(GROUP = rep('B',each = 6),
    +                           YEARMO = c(201801:201806),
    +                           VALUE = c(20,60,40,-20,-10,50),
    +                           ITER_1 = c(20,60,40,-30,0,50),
    +                           ITER_2 = c(20,60,10,0,0,50))
    > Trans_GRP_B
      GROUP YEARMO VALUE ITER_1 ITER_2
    1     B 201801    20     20     20
    2     B 201802    60     60     60
    3     B 201803    40     40     10
    4     B 201804   -20    -30      0
    5     B 201805   -10      0      0
    6     B 201806    50     50     50
    
    > Trans_GRP_C <- data.frame(GROUP = rep('C',each = 6),
    +                           YEARMO = c(201801:201806),
    +                           VALUE = c(0,10,-30,50,100,-100),
    +                           ITER_1 = c(0,10,-30,50,0,0),
    +                           ITER_2 = c(0,-20,0,50,0,0),
    +                           ITER_3 = c(-20,0,0,50,0,0))
    > Trans_GRP_C
      GROUP YEARMO VALUE ITER_1 ITER_2 ITER_3
    1     C 201801     0      0      0    -20
    2     C 201802    10     10    -20      0
    3     C 201803   -30    -30      0      0
    4     C 201804    50     50     50     50
    5     C 201805   100      0      0      0
    6     C 201806  -100      0      0      0
    
    library(data.table)
    DT <- as.data.table(raw_data)
    DT$final <- final_data$VALUE
    DT[, new := {
      x <- VALUE
      sn <- 0
      for (i in .N:1) {
        if (i > 1) {
          if (x[i] < 0) {
            sn <- sn + x[i]
            x[i] <- 0
          } else {
            tmp <- pmax(x[i] + sn, 0)
            sn <- sn + x[i] - tmp
            x[i] <- tmp
          }
        } else {
          x[i] <- x[i] + sn
        }
      }
      x
    }, by = GROUP]
    DT[]
    

    sn
    存储,即累加负值,然后由随后的(相反顺序)正值“消耗”

    这里有另一个选项,可以递归地将向量的正部分与向量的负部分相加,直到没有更多的负值,或者已经执行了.N次(其中.N是每组的行数)


    我怀疑是否存在一个纯粹的矢量化解决方案。可能需要一个循环构造
        GROUP YEARMO VALUE OUTPUT
     1:     A 201801   100     90
     2:     A 201802   -10      0
     3:     A 201803    20     20
     4:     A 201804    70     20
     5:     A 201805   -50      0
     6:     A 201806    30     30
     7:     B 201801    20     20
     8:     B 201802    60     60
     9:     B 201803    40     10
    10:     B 201804   -20      0
    11:     B 201805   -10      0
    12:     B 201806    50     50
    13:     C 201801     0    -20
    14:     C 201802    10      0
    15:     C 201803   -30      0
    16:     C 201804    50     50
    17:     C 201805   100      0
    18:     C 201806  -100      0