R 查找(有序)组中以前出现的情态索引
用一个例子来解释更容易。考虑这些数据:R 查找(有序)组中以前出现的情态索引,r,dplyr,R,Dplyr,用一个例子来解释更容易。考虑这些数据: library(dplyr) n_data = 20 set.seed(123) data_standout = data.frame(group = sample(c('kids','monkeys','banana', 'latte'), size = n_data, replace=TRUE), time
library(dplyr)
n_data = 20
set.seed(123)
data_standout = data.frame(group = sample(c('kids','monkeys','banana', 'latte'),
size = n_data, replace=TRUE),
time = runif(n_data),
modality = sample(c('grande','small','large'),
size = n_data, replace=TRUE),
stringsAsFactors = FALSE)
data_standin = data.frame(group = c('kids','monkeys','banana', 'latte'),
time = runif(4, 0, min(data_standout$time)),
modality = rep('small', 4), stringsAsFactors = FALSE)
data_final = bind_rows(data_standout, data_standin)
嗯。现在让我们特别考虑一组:猴子。
# A tibble: 7 x 3
# Groups: group [1]
group time modality
<chr> <dbl> <chr>
1 monkeys 0.09211798 small
2 monkeys 0.17505265 grande
3 monkeys 0.32037324 large
4 monkeys 0.43489274 large
5 monkeys 0.46677904 small
6 monkeys 0.48861303 grande
7 monkeys 0.78229430 small
速度不是主要要求,但dplyr是(数据管线的其余部分在dplyr中)。以
猴子
数据为例:
monkeys %>%
arrange(time) %>%
group_by(group) %>%
mutate(small_idx = lag(cummax(ifelse(modality == 'small', seq_along(modality), 0))))
# A tibble: 7 x 4
# Groups: group [1]
# group time modality small_idx
# <fctr> <dbl> <fctr> <dbl>
#1 monkeys 0.09211798 small NA
#2 monkeys 0.17505265 grande 1
#3 monkeys 0.32037324 large 1
#4 monkeys 0.43489274 large 1
#5 monkeys 0.46677904 small 1
#6 monkeys 0.48861303 grande 5
#7 monkeys 0.78229430 small 5
cummax
给出了迄今为止看到的small
的最大索引:
cummax(with(monkeys, ifelse(modality == 'small', seq_along(modality), 0)))
# [1] 1 1 1 1 5 5 7
lag
给出了前面看到的small
的最大索引:
lag(cummax(with(monkeys, ifelse(modality == 'small', seq_along(modality), 0))))
# [1] NA 1 1 1 1 5 5
这太酷了:这个技巧(ifelse+cumsum)可以很容易地推广到其他相关问题!万分感谢
cummax(with(monkeys, ifelse(modality == 'small', seq_along(modality), 0)))
# [1] 1 1 1 1 5 5 7
lag(cummax(with(monkeys, ifelse(modality == 'small', seq_along(modality), 0))))
# [1] NA 1 1 1 1 5 5