R 多个data.table子集的最大值_R_Data.table_Subset

R 多个data.table子集的最大值

R 多个data.table子集的最大值,r,data.table,subset,R,Data.table,Subset,给定的是一个data.table，其中包含基本数据、子集的startIndex、子集的持续时间。对于每个子集，应用每个持续时间 base <- data.table(idx=c(1,2,3,4,5,6,7,8,9,10), val=c(11,12,13,14,15,16,17,18,19,20)) startIndex <- c(2, 4, 7, 9) duration <- c(1,2,3) thx很多。我想不出一个特别优雅的解决方案，但我认为map函数应该可以完成这项工作

给定的是一个data.table，其中包含基本数据、子集的startIndex、子集的持续时间。对于每个子集，应用每个持续时间

base <- data.table(idx=c(1,2,3,4,5,6,7,8,9,10), val=c(11,12,13,14,15,16,17,18,19,20))
startIndex <- c(2, 4, 7, 9)
duration <- c(1,2,3)

thx很多。

我想不出一个特别优雅的解决方案，但我认为map函数应该可以完成这项工作。这是对每个组合的粗暴强制，因此可能有一个更有效的解决方案，但它应该有效

library(data.table)
base <- data.table(idx=c(1,2,3,4,5,6,7,8,9,10), val=c(11,12,13,14,15,16,17,18,19,20))
startIndex <- c(2, 4, 7, 9)
duration <- c(1,2,3)
combos <- expand.grid(startIndex = startIndex, 
                      duration = duration) %>% 
  mutate(endIndex = startIndex + duration)
max_slices <- map2(combos$startIndex, combos$endIndex, function(startIndex, endIndex){
  slice(base, startIndex, endIndex) %>% 
    select(val) %>% 
    max()
}) %>% 
  as.numeric()
result <- combos %>% 
  cbind(max = max_slices)

我想不出一个特别优雅的解决方案，但我认为应该使用map函数来完成这项工作。这是对每个组合的粗暴强制，因此可能有一个更有效的解决方案，但它应该有效

library(data.table)
base <- data.table(idx=c(1,2,3,4,5,6,7,8,9,10), val=c(11,12,13,14,15,16,17,18,19,20))
startIndex <- c(2, 4, 7, 9)
duration <- c(1,2,3)
combos <- expand.grid(startIndex = startIndex, 
                      duration = duration) %>% 
  mutate(endIndex = startIndex + duration)
max_slices <- map2(combos$startIndex, combos$endIndex, function(startIndex, endIndex){
  slice(base, startIndex, endIndex) %>% 
    select(val) %>% 
    max()
}) %>% 
  as.numeric()
result <- combos %>% 
  cbind(max = max_slices)

我有一个使用map函数的解决方案，但是我认为我没有将函数保留为data.table，因此这可能不令人满意。如果没有，请告诉我，我可以再看一看或参考其他答案。一个选项是对输出运行data.table函数

库（tidyverse）
库（数据表）
图书馆（dtplyr）
基本来源：本地数据表[12 x 3]
#>调用：copy（`u DT1`）[，`:=`（max=map2_dbl）（startIndex，duration，~max（…base$val[.x:（.x+
#>.y）]）]
#> 
#>startIndex最大持续时间
#>             
#> 1          2        1    13
#> 2          2        2    14
#> 3          2        3    15
#> 4          4        1    15
#> 5          4        2    16
#> 6          4        3    17
#> # ... 还有6行
#> 
#>#使用as.data.table（）/as.data.frame（）/as_tible（）访问结果

由（v2.0.0）创建于2021-04-04，我有一个使用map函数的解决方案，但是我认为我没有将该函数保留为data.table，因此这可能不令人满意。如果没有，请告诉我，我可以再看一看或参考其他答案。一个选项是对输出运行data.table函数

库（tidyverse）
库（数据表）
图书馆（dtplyr）
基本来源：本地数据表[12 x 3]
#>调用：copy（`u DT1`）[，`:=`（max=map2_dbl）（startIndex，duration，~max（…base$val[.x:（.x+
#>.y）]）]
#> 
#>startIndex最大持续时间
#>             
#> 1          2        1    13
#> 2          2        2    14
#> 3          2        3    15
#> 4          4        1    15
#> 5          4        2    16
#> 6          4        3    17
#> # ... 还有6行
#> 
#>#使用as.data.table（）/as.data.frame（）/as_tible（）访问结果

由（v2.0.0）于2021-04-04创建，这里是一个使用非等联接的

数据表。首先，使用expand.grid
组合开始索引和持续时间。然后，计算每行的结束索引。然后加入您的base
，索引idx
位于起始和结束之间，并保持最大val

library(data.table)

dt <- data.table(expand.grid(idxStart = startIndex, Duration = duration)) 

dt[ , idxEnd := idxStart + Duration][
  base, Max := max(val), on = .(idxStart <= idx, idxEnd >= idx), by = .EACHI]

下面是一个使用非等联接的data.table
方法。首先，使用expand.grid
组合开始索引和持续时间。然后，计算每行的结束索引。然后加入您的base
，索引idx
位于起始和结束之间，并保持最大val

library(data.table)

dt <- data.table(expand.grid(idxStart = startIndex, Duration = duration)) 

dt[ , idxEnd := idxStart + Duration][
  base, Max := max(val), on = .(idxStart <= idx, idxEnd >= idx), by = .EACHI]

    idxStart Duration idxEnd Max
 1:        2        1      3  13
 2:        4        1      5  15
 3:        7        1      8  18
 4:        9        1     10  20
 5:        2        2      4  14
 6:        4        2      6  16
 7:        7        2      9  19
 8:        9        2     11  20
 9:        2        3      5  15
10:        4        3      7  17
11:        7        3     10  20
12:        9        3     12  20