R 如何根据分组计算比例_R_Datatable_Dplyr_Plyr

R 如何根据分组计算比例

R 如何根据分组计算比例,r,datatable,dplyr,plyr,R,Datatable,Dplyr,Plyr,我试图计算目标功能的连续比例数据集这就是我所尝试的： df <- df %>% group_by(ID) %>% mutate(count_per_ID = row_number(), consecutive_target = sequence(rle(as.character(target))$lengths), val = ifelse(target == 0, 0, consecutive_target),

我试图计算目标功能的连续比例

数据集这就是我所尝试的：

df <-  df  %>%
  group_by(ID) %>%
  mutate(count_per_ID = row_number(),
         consecutive_target = sequence(rle(as.character(target))$lengths),
         val = ifelse(target == 0, 0, consecutive_target),
         proportion_target_by_ID = val / count_per_ID) %>%
  ungroup()

df%
分组依据（ID）%>%
变异（每个ID的计数=行数（），
连续_目标=序列（rle（作为字符（目标））$长度），
val=ifelse（目标==0，0，连续的_目标），
比例_目标_按_ID=val/count _每_ID）%>%
解组（）

我创建了count\u per\u ID，用于计算每个组ID的行总数
然后，连续_目标特征统计目标特征中的观察次数，每次发生变化时，它都会重新启动。我的意思是，在目标值的0或1之间切换
val基于target1或0值复制连续_目标中的那些值
按ID划分的目标比例采用val特征并除以按ID划分的计数

问题在于，当val特性中的值为0时，按ID计算目标值的比例的想法是无效的

ID target count_per_ID consecutive_target val proportion_target_by_ID <dbl> <dbl> <int> <int> <dbl> <dbl> 1 11 0 1 1 0 0 2 11 0 2 2 0 0 3 11 0 3 3 0 0 4 11 1 4 1 1 0.25 5 11 1 5 2 2 0.4 6 11 1 6 3 3 0.5 7 11 0 7 1 0 0 8 11 1 8 1 1 0.125 9 11 1 9 2 2 0.222 10 11 1 10 3 3 0.3 11 22 0 1 1 0 0 12 22 0 2 2 0 0 13 22 1 3 1 1 0.333 14 22 1 4 2 2 0.5 15 22 1 5 3 3 0.6 16 22 0 6 1 0 0 17 22 1 7 1 1 0.143 18 22 0 8 1 0 0 19 22 1 9 1 1 0.111 20 22 1 10 2 2 0.2

ID目标计数/ID连续/u目标值比例/u目标/u ID 1 11 0 1 1 0 0 2 11 0 2 2 0 0 3 11 0 3 3 0 0 4 11 1 4 1 1 0.25 5 11 1 5 2 2 0.4 6 11 1 6 3 3 0.5 7 11 0 7 1 0 0 8 11 1 8 1 1 0.125 9 11 1 9 2 2 0.222 10 11 1 10 3 3 0.3 11 22 0 1 1 0 0 12 22 0 2 2 0 0 13 22 1 3 1 1 0.333 14 22 1 4 2 2 0.5 15 22 1 5 3 3 0.6 16 22 0 6 1 0 0 17 22 1 7 1 1 0.143 18 22 0 8 1 0 0 19 22 1 9 1 1 0.111 20 22 1 10 2 2 0.2
结果应该是什么样的：

ID target count_per_ID consecutive_target val proportion_target_by_ID <dbl> <dbl> <int> <int> <dbl> <dbl> 1 11 0 1 1 0 0 2 11 0 2 2 0 0 3 11 0 3 3 0 0 4 11 1 4 1 1 0.25 5 11 1 5 2 2 0.4 6 11 1 6 3 3 0.5 7 11 0 7 1 3 0.428 8 11 1 8 1 4 0.5 9 11 1 9 2 5 0.555 10 11 1 10 3 6 0.6 11 22 0 1 1 0 0 12 22 0 2 2 0 0 13 22 1 3 1 1 0.333 14 22 1 4 2 2 0.5 15 22 1 5 3 3 0.6 16 22 0 6 1 3 0.5 17 22 1 7 1 4 0.571 18 22 0 8 1 4 0.5 19 22 1 9 1 5 0.55 20 22 1 10 2 6 0.6

ID目标计数/ID连续/u目标值比例/u目标/u ID 1 11 0 1 1 0 0 2 11 0 2 2 0 0 3 11 0 3 3 0 0 4 11 1 4 1 1 0.25 5 11 1 5 2 2 0.4 6 11 1 6 3 3 0.5 7 11 0 7 1 3 0.428 8 11 1 8 1 4 0.5 9 11 1 9 2 5 0.555 10 11 1 10 3 6 0.6 11 22 0 1 1 0 0 12 22 0 2 2 0 0 13 22 1 3 1 1 0.333 14 22 1 4 2 2 0.5 15 22 1 5 3 3 0.6 16 22 0 6 1 3 0.5 17 22 1 7 1 4 0.571 18 22 0 8 1 4 0.5 19 22 1 9 1 5 0.55 20 22 1 10 2 6 0.6
一个选项是更改创建“val”的代码

val = ifelse(target == 0, 0, consecutive_target
到
-完整代码

df %>% group_by(ID) %>% mutate(count_per_ID = row_number(), consecutive_target = sequence(rle(as.character(target))$lengths), val = cumsum(target != 0), proportion_target_by_ID = val / count_per_ID) # A tibble: 20 x 6 # Groups: ID [2] # ID target count_per_ID consecutive_target val proportion_target_by_ID # <dbl> <dbl> <int> <int> <int> <dbl> # 1 11 0 1 1 0 0 # 2 11 0 2 2 0 0 # 3 11 0 3 3 0 0 # 4 11 1 4 1 1 0.25 # 5 11 1 5 2 2 0.4 # 6 11 1 6 3 3 0.5 # 7 11 0 7 1 3 0.429 # 8 11 1 8 1 4 0.5 # 9 11 1 9 2 5 0.556 #10 11 1 10 3 6 0.6 #11 22 0 1 1 0 0 #12 22 0 2 2 0 0 #13 22 1 3 1 1 0.333 #14 22 1 4 2 2 0.5 #15 22 1 5 3 3 0.6 #16 22 0 6 1 3 0.5 #17 22 1 7 1 4 0.571 #18 22 0 8 1 4 0.5 #19 22 1 9 1 5 0.556 #20 22 1 10 2 6 0.6

df%>% 分组依据（ID）%>% 变异（每个ID的计数=行数（），连续_目标=序列（rle（作为字符（目标））$长度）， val=总和（目标！=0），比例_目标_按_ID=val/计数_每_ID） #一个tibble:20x6 #组别:ID[2] #ID目标计数\u每\u ID连续\u目标值比例\u目标\u按\u ID # # 1 11 0 1 1 0 0 # 2 11 0 2 2 0 0 # 3 11 0 3 3 0 0 # 4 11 1 4 1 1 0.25 # 5 11 1 5 2 2 0.4 # 6 11 1 6 3 3 0.5 # 7 11 0 7 1 3 0.429 # 8 11 1 8 1 4 0.5 # 9 11 1 9 2 5 0.556 #10 11 1 10 3 6 0.6 #11 22 0 1 val = ifelse(target == 0, 0, consecutive_target val = cumsum(target != 0) df %>% group_by(ID) %>% mutate(count_per_ID = row_number(), consecutive_target = sequence(rle(as.character(target))$lengths), val = cumsum(target != 0), proportion_target_by_ID = val / count_per_ID) # A tibble: 20 x 6 # Groups: ID [2] # ID target count_per_ID consecutive_target val proportion_target_by_ID # <dbl> <dbl> <int> <int> <int> <dbl> # 1 11 0 1 1 0 0 # 2 11 0 2 2 0 0 # 3 11 0 3 3 0 0 # 4 11 1 4 1 1 0.25 # 5 11 1 5 2 2 0.4 # 6 11 1 6 3 3 0.5 # 7 11 0 7 1 3 0.429 # 8 11 1 8 1 4 0.5 # 9 11 1 9 2 5 0.556 #10 11 1 10 3 6 0.6 #11 22 0 1 1 0 0 #12 22 0 2 2 0 0 #13 22 1 3 1 1 0.333 #14 22 1 4 2 2 0.5 #15 22 1 5 3 3 0.6 #16 22 0 6 1 3 0.5 #17 22 1 7 1 4 0.571 #18 22 0 8 1 4 0.5 #19 22 1 9 1 5 0.556 #20 22 1 10 2 6 0.6