R 如何根据分组计算比例
我试图计算目标功能的连续比例 数据集 这就是我所尝试的:R 如何根据分组计算比例,r,datatable,dplyr,plyr,R,Datatable,Dplyr,Plyr,我试图计算目标功能的连续比例 数据集 这就是我所尝试的: df <- df %>% group_by(ID) %>% mutate(count_per_ID = row_number(), consecutive_target = sequence(rle(as.character(target))$lengths), val = ifelse(target == 0, 0, consecutive_target),
df <- df %>%
group_by(ID) %>%
mutate(count_per_ID = row_number(),
consecutive_target = sequence(rle(as.character(target))$lengths),
val = ifelse(target == 0, 0, consecutive_target),
proportion_target_by_ID = val / count_per_ID) %>%
ungroup()
df%
分组依据(ID)%>%
变异(每个ID的计数=行数(),
连续_目标=序列(rle(作为字符(目标))$长度),
val=ifelse(目标==0,0,连续的_目标),
比例_目标_按_ID=val/count _每_ID)%>%
解组()
- 我创建了count\u per\u ID,用于计算每个组ID的行总数
- 然后,连续_目标特征统计目标特征中的观察次数,每次发生变化时,它都会重新启动。我的意思是,在目标值的0或1之间切换李>
- val基于target1或0值复制连续_目标中的那些值李>
- 按ID划分的目标比例采用val特征并除以按ID划分的计数
ID target count_per_ID consecutive_target val proportion_target_by_ID
<dbl> <dbl> <int> <int> <dbl> <dbl>
1 11 0 1 1 0 0
2 11 0 2 2 0 0
3 11 0 3 3 0 0
4 11 1 4 1 1 0.25
5 11 1 5 2 2 0.4
6 11 1 6 3 3 0.5
7 11 0 7 1 0 0
8 11 1 8 1 1 0.125
9 11 1 9 2 2 0.222
10 11 1 10 3 3 0.3
11 22 0 1 1 0 0
12 22 0 2 2 0 0
13 22 1 3 1 1 0.333
14 22 1 4 2 2 0.5
15 22 1 5 3 3 0.6
16 22 0 6 1 0 0
17 22 1 7 1 1 0.143
18 22 0 8 1 0 0
19 22 1 9 1 1 0.111
20 22 1 10 2 2 0.2
ID目标计数/ID连续/u目标值比例/u目标/u ID
1 11 0 1 1 0 0
2 11 0 2 2 0 0
3 11 0 3 3 0 0
4 11 1 4 1 1 0.25
5 11 1 5 2 2 0.4
6 11 1 6 3 3 0.5
7 11 0 7 1 0 0
8 11 1 8 1 1 0.125
9 11 1 9 2 2 0.222
10 11 1 10 3 3 0.3
11 22 0 1 1 0 0
12 22 0 2 2 0 0
13 22 1 3 1 1 0.333
14 22 1 4 2 2 0.5
15 22 1 5 3 3 0.6
16 22 0 6 1 0 0
17 22 1 7 1 1 0.143
18 22 0 8 1 0 0
19 22 1 9 1 1 0.111
20 22 1 10 2 2 0.2
结果应该是什么样的:
ID target count_per_ID consecutive_target val proportion_target_by_ID
<dbl> <dbl> <int> <int> <dbl> <dbl>
1 11 0 1 1 0 0
2 11 0 2 2 0 0
3 11 0 3 3 0 0
4 11 1 4 1 1 0.25
5 11 1 5 2 2 0.4
6 11 1 6 3 3 0.5
7 11 0 7 1 3 0.428
8 11 1 8 1 4 0.5
9 11 1 9 2 5 0.555
10 11 1 10 3 6 0.6
11 22 0 1 1 0 0
12 22 0 2 2 0 0
13 22 1 3 1 1 0.333
14 22 1 4 2 2 0.5
15 22 1 5 3 3 0.6
16 22 0 6 1 3 0.5
17 22 1 7 1 4 0.571
18 22 0 8 1 4 0.5
19 22 1 9 1 5 0.55
20 22 1 10 2 6 0.6
ID目标计数/ID连续/u目标值比例/u目标/u ID
1 11 0 1 1 0 0
2 11 0 2 2 0 0
3 11 0 3 3 0 0
4 11 1 4 1 1 0.25
5 11 1 5 2 2 0.4
6 11 1 6 3 3 0.5
7 11 0 7 1 3 0.428
8 11 1 8 1 4 0.5
9 11 1 9 2 5 0.555
10 11 1 10 3 6 0.6
11 22 0 1 1 0 0
12 22 0 2 2 0 0
13 22 1 3 1 1 0.333
14 22 1 4 2 2 0.5
15 22 1 5 3 3 0.6
16 22 0 6 1 3 0.5
17 22 1 7 1 4 0.571
18 22 0 8 1 4 0.5
19 22 1 9 1 5 0.55
20 22 1 10 2 6 0.6
一个选项是更改创建“val”的代码
val = ifelse(target == 0, 0, consecutive_target
到
-完整代码
df %>%
group_by(ID) %>%
mutate(count_per_ID = row_number(),
consecutive_target = sequence(rle(as.character(target))$lengths),
val = cumsum(target != 0),
proportion_target_by_ID = val / count_per_ID)
# A tibble: 20 x 6
# Groups: ID [2]
# ID target count_per_ID consecutive_target val proportion_target_by_ID
# <dbl> <dbl> <int> <int> <int> <dbl>
# 1 11 0 1 1 0 0
# 2 11 0 2 2 0 0
# 3 11 0 3 3 0 0
# 4 11 1 4 1 1 0.25
# 5 11 1 5 2 2 0.4
# 6 11 1 6 3 3 0.5
# 7 11 0 7 1 3 0.429
# 8 11 1 8 1 4 0.5
# 9 11 1 9 2 5 0.556
#10 11 1 10 3 6 0.6
#11 22 0 1 1 0 0
#12 22 0 2 2 0 0
#13 22 1 3 1 1 0.333
#14 22 1 4 2 2 0.5
#15 22 1 5 3 3 0.6
#16 22 0 6 1 3 0.5
#17 22 1 7 1 4 0.571
#18 22 0 8 1 4 0.5
#19 22 1 9 1 5 0.556
#20 22 1 10 2 6 0.6
df%>%
分组依据(ID)%>%
变异(每个ID的计数=行数(),
连续_目标=序列(rle(作为字符(目标))$长度),
val=总和(目标!=0),
比例_目标_按_ID=val/计数_每_ID)
#一个tibble:20x6
#组别:ID[2]
#ID目标计数\u每\u ID连续\u目标值比例\u目标\u按\u ID
#
# 1 11 0 1 1 0 0
# 2 11 0 2 2 0 0
# 3 11 0 3 3 0 0
# 4 11 1 4 1 1 0.25
# 5 11 1 5 2 2 0.4
# 6 11 1 6 3 3 0.5
# 7 11 0 7 1 3 0.429
# 8 11 1 8 1 4 0.5
# 9 11 1 9 2 5 0.556
#10 11 1 10 3 6 0.6
#11 22 0 1
val = ifelse(target == 0, 0, consecutive_target
val = cumsum(target != 0)
df %>%
group_by(ID) %>%
mutate(count_per_ID = row_number(),
consecutive_target = sequence(rle(as.character(target))$lengths),
val = cumsum(target != 0),
proportion_target_by_ID = val / count_per_ID)
# A tibble: 20 x 6
# Groups: ID [2]
# ID target count_per_ID consecutive_target val proportion_target_by_ID
# <dbl> <dbl> <int> <int> <int> <dbl>
# 1 11 0 1 1 0 0
# 2 11 0 2 2 0 0
# 3 11 0 3 3 0 0
# 4 11 1 4 1 1 0.25
# 5 11 1 5 2 2 0.4
# 6 11 1 6 3 3 0.5
# 7 11 0 7 1 3 0.429
# 8 11 1 8 1 4 0.5
# 9 11 1 9 2 5 0.556
#10 11 1 10 3 6 0.6
#11 22 0 1 1 0 0
#12 22 0 2 2 0 0
#13 22 1 3 1 1 0.333
#14 22 1 4 2 2 0.5
#15 22 1 5 3 3 0.6
#16 22 0 6 1 3 0.5
#17 22 1 7 1 4 0.571
#18 22 0 8 1 4 0.5
#19 22 1 9 1 5 0.556
#20 22 1 10 2 6 0.6