R 修改函数以查找每行一个数字的连续运行的最大长度

R 修改函数以查找每行一个数字的连续运行的最大长度,r,data-manipulation,R,Data Manipulation,我想修改问题的答案,或者有一个新的解决方案,以包含另一列,显示第二大连续运行的“0”。下面是我的示例数据和代码,函数在月份列上运行,第二大运行列是我希望添加的。我正在处理一个大数据集,因此效率越高越好,任何想法都会受到欢迎,谢谢 样本数据 structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9), V1 = c("A", "B", "A", "B", "B", "A", "A", "B", "B"), V2 = c(21, 233, 185, 85,

我想修改问题的答案,或者有一个新的解决方案,以包含另一列,显示第二大连续运行的“0”。下面是我的示例数据和代码,函数在月份列上运行,
第二大运行列是我希望添加的。我正在处理一个大数据集,因此效率越高越好,任何想法都会受到欢迎,谢谢

样本数据

structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9), V1 = c("A", 
"B", "A", "B", "B", "A", "A", "B", "B"), V2 = c(21, 233, 185, 
85, 208, 112, 238, 66, 38), V3 = c(149, 250, 218, 104, 62, 19, 
175, 168, 28), Jan = c(10, 20, 10, 12, 76, 28, 137, 162, 101), 
    Feb = c(20, 25, 15, 0, 89, 0, 152, 177, 119), March = c(0, 
    28, 20, 14, 108, 0, 165, 194, 132), April = c(0, 34, 25, 
    16, 125, 71, 181, 208, 149), May = c(25, 0, 30, 22, 135, 
    0, 191, 224, 169), June = c(29, 0, 35, 24, 145, 0, 205, 244, 
    187), July = c(34, 0, 40, 28, 163, 0, 217, 256, 207), August = c(37, 
    0, 45, 29, 173, 0, 228, 276, 221), Sep = c(0, 39, 50, 31, 
    193, 0, 239, 308, 236), Oct = c(0, 48, 55, 35, 210, 163, 
    252, 0, 247), Nov = c(48, 55, 60, 40, 221, 183, 272, 0, 264
    ), Dec = c(50, 60, 65, 45, 239, 195, 289, 0, 277), `Second largest run` = c(1, 
    NA, NA, NA, NA, 2, NA, NA, NA), result = c(2, 4, -Inf, 1, 
    -Inf, 5, -Inf, 3, -Inf)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))
代码


与从运行长度编码(
rle
)函数中获取
max
不同,我们希望对输出进行
排序
,然后提取所需的索引。当我们请求一个不存在的索引时,我们将得到NA——例如,第2行中没有第二次运行的零

ordered_runs = function(x, val = 0, idx = 1) {
  with(rle(x), sort(lengths[values == val], decreasing = TRUE))[idx]
}

test$result_1 <- apply(test[,-c(1:4,17:18)], MARGIN = 1, ordered_runs, idx = 1)
test$result_2 <- apply(test[,-c(1:4,17:18)], MARGIN = 1, ordered_runs, idx = 2)
ordered_runs=函数(x,val=0,idx=1){
使用(rle(x),排序(长度[values==val],递减=TRUE))[idx]
}

test$result_1这里有一个使用
数据的选项。表
对于OP的大型数据集应该非常快,并且还可以同时识别所有零序列:

library(data.table)
setDT(DF)
cols <- c("Jan", "Feb", "March", "April", "May", "June", "July", "August", "Sep", "Oct", "Nov", "Dec")

#convert into a long format
m <- melt(DF, measure.vars=cols)[
    #identify consecutive sequences of the same number and count
    order(ID), c("rl", "rw") := .(rl <- rleid(ID, value), rowid(rl))][
        #extract the last element where values = 0 (that is the length of sequences of zeros)
        value == 0L, .(ID=ID[.N], len=rw[.N]), rl][
            #sort in descending order for length of sequences
            order(ID, -len)]

#pivot into wide format and perform a update join
wide <- dcast(m, ID ~ rowid(ID), value.var="len")
DF[wide, on=.(ID), (names(wide)) := mget(names(wide))]
数据:


DF您能解释一下
第二大运行列如何具有1,NA,NA,NA,NA,2,NA,NA,NA,NA这样的值吗?您是对的,第一行是平局,这很完美,谢谢!
> test[,c(1,17:20)]
# A tibble: 9 x 5
     ID `Second largest run` result result_1 result_2
  <dbl>                <dbl>  <dbl>    <int>    <int>
1     1                    1      2        2        2
2     2                   NA      4        4       NA
3     3                   NA   -Inf       NA       NA
4     4                   NA      1        1       NA
5     5                   NA   -Inf       NA       NA
6     6                    2      5        5        2
7     7                   NA   -Inf       NA       NA
8     8                   NA      3        3       NA
9     9                   NA   -Inf       NA       NA
library(data.table)
setDT(DF)
cols <- c("Jan", "Feb", "March", "April", "May", "June", "July", "August", "Sep", "Oct", "Nov", "Dec")

#convert into a long format
m <- melt(DF, measure.vars=cols)[
    #identify consecutive sequences of the same number and count
    order(ID), c("rl", "rw") := .(rl <- rleid(ID, value), rowid(rl))][
        #extract the last element where values = 0 (that is the length of sequences of zeros)
        value == 0L, .(ID=ID[.N], len=rw[.N]), rl][
            #sort in descending order for length of sequences
            order(ID, -len)]

#pivot into wide format and perform a update join
wide <- dcast(m, ID ~ rowid(ID), value.var="len")
DF[wide, on=.(ID), (names(wide)) := mget(names(wide))]
   ID V1  V2  V3 Jan Feb March April May June July August Sep Oct Nov Dec  1  2
1:  1  A  21 149  10  20     0     0  25   29   34     37   0   0  48  50  2  2
2:  2  B 233 250  20  25    28    34   0    0    0      0  39  48  55  60  4 NA
3:  3  A 185 218  10  15    20    25  30   35   40     45  50  55  60  65 NA NA
4:  4  B  85 104  12   0    14    16  22   24   28     29  31  35  40  45  1 NA
5:  5  B 208  62  76  89   108   125 135  145  163    173 193 210 221 239 NA NA
6:  6  A 112  19  28   0     0    71   0    0    0      0   0 163 183 195  5  2
7:  7  A 238 175 137 152   165   181 191  205  217    228 239 252 272 289 NA NA
8:  8  B  66 168 162 177   194   208 224  244  256    276 308   0   0   0  3 NA
9:  9  B  38  28 101 119   132   149 169  187  207    221 236 247 264 277 NA NA
DF <- structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9), V1 = c("A", 
    "B", "A", "B", "B", "A", "A", "B", "B"), V2 = c(21, 233, 185, 
        85, 208, 112, 238, 66, 38), V3 = c(149, 250, 218, 104, 62, 19, 
            175, 168, 28), Jan = c(10, 20, 10, 12, 76, 28, 137, 162, 101), 
    Feb = c(20, 25, 15, 0, 89, 0, 152, 177, 119), March = c(0, 
        28, 20, 14, 108, 0, 165, 194, 132), April = c(0, 34, 25, 
            16, 125, 71, 181, 208, 149), May = c(25, 0, 30, 22, 135, 
                0, 191, 224, 169), June = c(29, 0, 35, 24, 145, 0, 205, 244, 
                    187), July = c(34, 0, 40, 28, 163, 0, 217, 256, 207), August = c(37, 
                        0, 45, 29, 173, 0, 228, 276, 221), Sep = c(0, 39, 50, 31, 
                            193, 0, 239, 308, 236), Oct = c(0, 48, 55, 35, 210, 163, 
                                252, 0, 247), Nov = c(48, 55, 60, 40, 221, 183, 272, 0, 264
                                ), Dec = c(50, 60, 65, 45, 239, 195, 289, 0, 277), `1` = c(2L, 
                                    4L, NA, 1L, NA, 5L, NA, 3L, NA), `2` = c(2L, NA, NA, NA, 
                                        NA, 2L, NA, NA, NA)), row.names = c(NA, -9L), class = "data.frame")