R 如何根据分组计算比例

R 如何根据分组计算比例,r,datatable,dplyr,plyr,R,Datatable,Dplyr,Plyr,我试图计算目标功能的连续比例 数据集 这就是我所尝试的: df <- df %>% group_by(ID) %>% mutate(count_per_ID = row_number(), consecutive_target = sequence(rle(as.character(target))$lengths), val = ifelse(target == 0, 0, consecutive_target),

我试图计算目标功能的连续比例

数据集 这就是我所尝试的:

df <-  df  %>%
  group_by(ID) %>%
  mutate(count_per_ID = row_number(),
         consecutive_target = sequence(rle(as.character(target))$lengths),
         val = ifelse(target == 0, 0, consecutive_target),
         proportion_target_by_ID = val / count_per_ID) %>%
  ungroup()
df%
分组依据(ID)%>%
变异(每个ID的计数=行数(),
连续_目标=序列(rle(作为字符(目标))$长度),
val=ifelse(目标==0,0,连续的_目标),
比例_目标_按_ID=val/count _每_ID)%>%
解组()
  • 我创建了count\u per\u ID,用于计算每个组ID的行总数
  • 然后,连续_目标特征统计目标特征中的观察次数,每次发生变化时,它都会重新启动。我的意思是,在目标值的0或1之间切换
  • val基于target1或0值复制连续_目标中的那些值
  • 按ID划分的目标比例采用val特征并除以按ID划分的计数
问题在于,当val特性中的值为0时,按ID计算目标值的比例的想法是无效的

      ID target count_per_ID consecutive_target   val proportion_target_by_ID
   <dbl>  <dbl>        <int>              <int> <dbl>                   <dbl>
 1    11      0            1                  1     0                   0    
 2    11      0            2                  2     0                   0    
 3    11      0            3                  3     0                   0    
 4    11      1            4                  1     1                   0.25 
 5    11      1            5                  2     2                   0.4  
 6    11      1            6                  3     3                   0.5  
 7    11      0            7                  1     0                   0    
 8    11      1            8                  1     1                   0.125
 9    11      1            9                  2     2                   0.222
10    11      1           10                  3     3                   0.3  
11    22      0            1                  1     0                   0    
12    22      0            2                  2     0                   0    
13    22      1            3                  1     1                   0.333
14    22      1            4                  2     2                   0.5  
15    22      1            5                  3     3                   0.6  
16    22      0            6                  1     0                   0    
17    22      1            7                  1     1                   0.143
18    22      0            8                  1     0                   0    
19    22      1            9                  1     1                   0.111
20    22      1           10                  2     2                   0.2  
ID目标计数/ID连续/u目标值比例/u目标/u ID
1    11      0            1                  1     0                   0    
2    11      0            2                  2     0                   0    
3    11      0            3                  3     0                   0    
4    11      1            4                  1     1                   0.25 
5    11      1            5                  2     2                   0.4  
6    11      1            6                  3     3                   0.5  
7    11      0            7                  1     0                   0    
8    11      1            8                  1     1                   0.125
9    11      1            9                  2     2                   0.222
10    11      1           10                  3     3                   0.3  
11    22      0            1                  1     0                   0    
12    22      0            2                  2     0                   0    
13    22      1            3                  1     1                   0.333
14    22      1            4                  2     2                   0.5  
15    22      1            5                  3     3                   0.6  
16    22      0            6                  1     0                   0    
17    22      1            7                  1     1                   0.143
18    22      0            8                  1     0                   0    
19    22      1            9                  1     1                   0.111
20    22      1           10                  2     2                   0.2  
结果应该是什么样的:

      ID target count_per_ID consecutive_target   val proportion_target_by_ID
   <dbl>  <dbl>        <int>              <int> <dbl>                   <dbl>
 1    11      0            1                  1     0                   0    
 2    11      0            2                  2     0                   0    
 3    11      0            3                  3     0                   0    
 4    11      1            4                  1     1                   0.25 
 5    11      1            5                  2     2                   0.4  
 6    11      1            6                  3     3                   0.5  
 7    11      0            7                  1     3                   0.428    
 8    11      1            8                  1     4                   0.5
 9    11      1            9                  2     5                   0.555
10    11      1           10                  3     6                   0.6  
11    22      0            1                  1     0                   0    
12    22      0            2                  2     0                   0    
13    22      1            3                  1     1                   0.333
14    22      1            4                  2     2                   0.5  
15    22      1            5                  3     3                   0.6  
16    22      0            6                  1     3                   0.5    
17    22      1            7                  1     4                   0.571
18    22      0            8                  1     4                   0.5    
19    22      1            9                  1     5                   0.55
20    22      1           10                  2     6                   0.6  
ID目标计数/ID连续/u目标值比例/u目标/u ID
1    11      0            1                  1     0                   0    
2    11      0            2                  2     0                   0    
3    11      0            3                  3     0                   0    
4    11      1            4                  1     1                   0.25 
5    11      1            5                  2     2                   0.4  
6    11      1            6                  3     3                   0.5  
7    11      0            7                  1     3                   0.428    
8    11      1            8                  1     4                   0.5
9    11      1            9                  2     5                   0.555
10    11      1           10                  3     6                   0.6  
11    22      0            1                  1     0                   0    
12    22      0            2                  2     0                   0    
13    22      1            3                  1     1                   0.333
14    22      1            4                  2     2                   0.5  
15    22      1            5                  3     3                   0.6  
16    22      0            6                  1     3                   0.5    
17    22      1            7                  1     4                   0.571
18    22      0            8                  1     4                   0.5    
19    22      1            9                  1     5                   0.55
20    22      1           10                  2     6                   0.6  

一个选项是更改创建“val”的代码

val = ifelse(target == 0, 0, consecutive_target

-完整代码

df %>% 
     group_by(ID) %>% 
     mutate(count_per_ID = row_number(), 
            consecutive_target = sequence(rle(as.character(target))$lengths), 
            val = cumsum(target != 0),
             proportion_target_by_ID = val / count_per_ID)
# A tibble: 20 x 6
# Groups:   ID [2]
#      ID target count_per_ID consecutive_target   val proportion_target_by_ID
#   <dbl>  <dbl>        <int>              <int> <int>                   <dbl>
# 1    11      0            1                  1     0                   0    
# 2    11      0            2                  2     0                   0    
# 3    11      0            3                  3     0                   0    
# 4    11      1            4                  1     1                   0.25 
# 5    11      1            5                  2     2                   0.4  
# 6    11      1            6                  3     3                   0.5  
# 7    11      0            7                  1     3                   0.429
# 8    11      1            8                  1     4                   0.5  
# 9    11      1            9                  2     5                   0.556
#10    11      1           10                  3     6                   0.6  
#11    22      0            1                  1     0                   0    
#12    22      0            2                  2     0                   0    
#13    22      1            3                  1     1                   0.333
#14    22      1            4                  2     2                   0.5  
#15    22      1            5                  3     3                   0.6  
#16    22      0            6                  1     3                   0.5  
#17    22      1            7                  1     4                   0.571
#18    22      0            8                  1     4                   0.5  
#19    22      1            9                  1     5                   0.556
#20    22      1           10                  2     6                   0.6  
df%>%
分组依据(ID)%>%
变异(每个ID的计数=行数(),
连续_目标=序列(rle(作为字符(目标))$长度),
val=总和(目标!=0),
比例_目标_按_ID=val/计数_每_ID)
#一个tibble:20x6
#组别:ID[2]
#ID目标计数\u每\u ID连续\u目标值比例\u目标\u按\u ID
#                                               
# 1    11      0            1                  1     0                   0    
# 2    11      0            2                  2     0                   0    
# 3    11      0            3                  3     0                   0    
# 4    11      1            4                  1     1                   0.25 
# 5    11      1            5                  2     2                   0.4  
# 6    11      1            6                  3     3                   0.5  
# 7    11      0            7                  1     3                   0.429
# 8    11      1            8                  1     4                   0.5  
# 9    11      1            9                  2     5                   0.556
#10    11      1           10                  3     6                   0.6  
#11    22      0            1
val = ifelse(target == 0, 0, consecutive_target
val = cumsum(target != 0)
df %>% 
     group_by(ID) %>% 
     mutate(count_per_ID = row_number(), 
            consecutive_target = sequence(rle(as.character(target))$lengths), 
            val = cumsum(target != 0),
             proportion_target_by_ID = val / count_per_ID)
# A tibble: 20 x 6
# Groups:   ID [2]
#      ID target count_per_ID consecutive_target   val proportion_target_by_ID
#   <dbl>  <dbl>        <int>              <int> <int>                   <dbl>
# 1    11      0            1                  1     0                   0    
# 2    11      0            2                  2     0                   0    
# 3    11      0            3                  3     0                   0    
# 4    11      1            4                  1     1                   0.25 
# 5    11      1            5                  2     2                   0.4  
# 6    11      1            6                  3     3                   0.5  
# 7    11      0            7                  1     3                   0.429
# 8    11      1            8                  1     4                   0.5  
# 9    11      1            9                  2     5                   0.556
#10    11      1           10                  3     6                   0.6  
#11    22      0            1                  1     0                   0    
#12    22      0            2                  2     0                   0    
#13    22      1            3                  1     1                   0.333
#14    22      1            4                  2     2                   0.5  
#15    22      1            5                  3     3                   0.6  
#16    22      0            6                  1     3                   0.5  
#17    22      1            7                  1     4                   0.571
#18    22      0            8                  1     4                   0.5  
#19    22      1            9                  1     5                   0.556
#20    22      1           10                  2     6                   0.6