R中正值的和块_R_Arrays_Data Science_Rolling Computation

R中正值的和块

r arrays

R中正值的和块,r,arrays,data-science,rolling-computation,R,Arrays,Data Science,Rolling Computation,我有一个大数据集，150k行，大小约11MB。每一行包含一个每小时的利润指标，可以是正的、负的或零的。我试图计算一个新的变量，等于每个正“块”的利润。希望这在下面的数据集中是不言自明的 “利润”是输入变量。我可以得到下两列，但无法解决的“利润块”。任何帮助都将不胜感激 dat <- data.frame(profit = c(20, 10, 5, 10, -20, -100, -40, 500, 27, -20), indic_pos = c( 1, 1

我有一个大数据集，150k行，大小约11MB。每一行包含一个每小时的利润指标，可以是正的、负的或零的。我试图计算一个新的变量，等于每个正“块”的利润。希望这在下面的数据集中是不言自明的

“利润”是输入变量。我可以得到下两列，但无法解决的“利润块”。任何帮助都将不胜感激

dat <- data.frame(profit = c(20, 10, 5, 10, -20, -100, -40, 500, 27, -20),
                  indic_pos = c( 1, 1, 1, 1, 0, 0, 0, 1, 1, 0),
                  cum_profit = c(20, 30, 35, 45, 0, 0, 0, 500, 527, 0),
                  profit_block = c(45, 45, 45, 45, 0, 0, 0, 527, 527, 0))

   profit indic_pos cum_profit profit_block
1      20         1         20           45
2      10         1         30           45
3       5         1         35           45
4      10         1         45           45
5     -20         0          0            0
6    -100         0          0            0
7     -40         0          0            0
8     500         1        500          527
9      27         1        527          527
10    -20         0          0            0

dat我们可以使用rleid
基于列的符号创建一个组，即相同的相邻符号元素将是一个组，然后获得“cum_profit”的max

library(dplyr)
dat %>% 
    group_by(grp = rleid(sign(profit))) %>% 
     mutate(profit_block2 = max(cum_profit)) %>%
     ungroup %>%
     select(-grp)

-输出
# A tibble: 10 x 5
#   profit indic_pos cum_profit profit_block profit_block2
#    <dbl>     <dbl>      <dbl>        <dbl>         <dbl>
# 1     20         1         20           45            45
# 2     10         1         30           45            45
# 3      5         1         35           45            45
# 4     10         1         45           45            45
# 5    -20         0          0            0             0
# 6   -100         0          0            0             0
# 7    -40         0          0            0             0
# 8    500         1        500          527           527
# 9     27         1        527          527           527
#10    -20         0          0            0             0

#一个tible:10 x 5
#利润指标兼利润块利润块利润块2
#                                
# 1     20         1         20           45            45
# 2     10         1         30           45            45
# 3      5         1         35           45            45
# 4     10         1         45           45            45
# 5    -20         0          0            0             0
# 6   -100         0          0            0             0
# 7    -40         0          0            0             0
# 8    500         1        500          527           527
# 9     27         1        527          527           527
#10    -20         0          0            0             0
我们可以使用rleid
基于列的符号创建一个组，即相同的相邻符号元素将是一个组，然后获得“cum\u”的max

library(dplyr)
dat %>% 
    group_by(grp = rleid(sign(profit))) %>% 
     mutate(profit_block2 = max(cum_profit)) %>%
     ungroup %>%
     select(-grp)

-输出
# A tibble: 10 x 5
#   profit indic_pos cum_profit profit_block profit_block2
#    <dbl>     <dbl>      <dbl>        <dbl>         <dbl>
# 1     20         1         20           45            45
# 2     10         1         30           45            45
# 3      5         1         35           45            45
# 4     10         1         45           45            45
# 5    -20         0          0            0             0
# 6   -100         0          0            0             0
# 7    -40         0          0            0             0
# 8    500         1        500          527           527
# 9     27         1        527          527           527
#10    -20         0          0            0             0

#一个tible:10 x 5
#利润指标兼利润块利润块利润块2
#                                
# 1     20         1         20           45            45
# 2     10         1         30           45            45
# 3      5         1         35           45            45
# 4     10         1         45           45            45
# 5    -20         0          0            0             0
# 6   -100         0          0            0             0
# 7    -40         0          0            0             0
# 8    500         1        500          527           527
# 9     27         1        527          527           527
#10    -20         0          0            0             0
akrun，非常感谢您的快速解决方案。成功了！仅供参考，rleid函数似乎在data.table包中。安装好后，我就准备好了。@danj谢谢。是的，它来自数据。表忘了提到itakrun，非常感谢您的快速解决方案。成功了！仅供参考，rleid函数似乎在data.table包中。安装好后，我就准备好了。@danj谢谢。是的，它来自数据。表忘了提到它