R 如何根据列中特定的值序列在df中标记组
我有一个数据框,其中的id和value列如下所示,但我想根据id组根据value列中的值确定Status列R 如何根据列中特定的值序列在df中标记组,r,dplyr,conditional-statements,mutate,rle,R,Dplyr,Conditional Statements,Mutate,Rle,我有一个数据框,其中的id和value列如下所示,但我想根据id组根据value列中的值确定Status列 x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)), serial = rep(1:10,3), value = c(rep(1,4), rep(0,3), rep(1,3), rep(1,4), rep
x <- data.frame(id = c(rep(1,10), rep(2,10), rep(3,10)),
serial = rep(1:10,3),
value = c(rep(1,4), rep(0,3), rep(1,3),
rep(1,4), rep(0,1), rep(-1,2), rep(1,3),
rep(c(1,0),5)),
status = c(rep("Fluctuating", 10),
rep("Fluctuating", 10),
rep("Not fluctuating", 10)))
在这里,如果三个或更多的1后跟3个或更多(0或-1),再后跟3个或更多的1,则认为一个组是波动的。如果三个或三个以上的0-1s-0s、-1s-0s-1s等交替出现,也将被视为波动
想知道分配状态列的最佳方法是什么,最好使用dplyr
谢谢 库(dplyr)
#图书馆(动物园)#
三分
分组依据(id)%>%
突变(status2=paste0(如果(三个(值))“”否则“不”,“波动”))%>%
解组()%>%
打印(n=99)
##A tible:30 x 5
#id序列值状态状态2
#
#1.波动
#2.1波动
#3.1.3.1波动
#4.1.4.1波动
#5 1 5 0波动
#6.1.6.0波动
#7.1.7.0波动
#8.1.8.1波动
#9.1.9.1波动
#10 1 10 1波动
#11.2.1波动
#12.2.1波动
#13.2.3.1波动
#14 2 4 1波动
#15 2 5 0波动
#16 2 6-1波动
#17 2 7-1波动
#18 2 8 1波动
#19 2 9 1波动
#20 2 10 1波动
#21 3 1不波动不波动
#22 3 2 0不波动不波动
#23 3 1不波动不波动
#24 3 4 0不波动不波动
#25 3 5 1不波动不波动
#26 3 6 0不波动不波动
#27 3 7 1不波动不波动
#28 3 8 0不波动不波动
#29 3 9 1不波动不波动
#30 3 10 0不波动不波动
使用rle
函数和dplyr
库
x %>%
mutate(value_new = ifelse(value == -1, 0, value)) %>%
group_by(id) %>%
mutate(status = ifelse(all(rle(value_new)$lengths >= 3), "Fluctuating", "Not fluctuating")) %>%
select(-value_new)
输出
# A tibble: 30 x 4
# Groups: id [3]
id serial value status
<dbl> <int> <dbl> <chr>
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating
#一个tible:30 x 4
#组别:id[3]
id序列值状态
1波动
2.1.1波动
3.1.3.1波动
4.1.4.1波动
5.1.5.0波动性
6160波动
7170波动
8.1.8.1波动
9.1.9.1波动
10 1 10 1波动
11.2.1波动
12.2.1波动
13.2.3.1波动性
14 2 4 1波动
15 2 5 0波动
16 2 6-1波动
17 2 7-1波动
18 2 8 1波动
19 2 9 1波动
20 2 10 1波动
21 3 1不波动
22 3 2 0不波动
23 3 1不波动
24340不波动
25 3 5 1不波动
26 3 6 0不波动
27 3 7 1不波动
28 3 8 0不波动
29 3 9 1不波动
30 3 10 0不波动
这将触发“波动”
即使序列包含0s/1s/0s而不是1s/0s/1s。你是对的,我认为1和0的顺序并不重要,只要它们在3上至少是3“三个或更多的1s后面是3或更多(0s或-1s),然后是3或更多的1s。”我明白你假设的逻辑,但如果是这样的话,OP应该更新他们的问题。是的,我理解这一点。我推断可能有什么地方不对。我确实要求澄清。为这个不全面的例子道歉,澄清了这个问题。波动的
序列应该总是从1开始,还是也可以从0开始,只要1和0交替至少3乘3?只要它们交替!
# A tibble: 30 x 4
# Groups: id [3]
id serial value status
<dbl> <int> <dbl> <chr>
1 1 1 1 Fluctuating
2 1 2 1 Fluctuating
3 1 3 1 Fluctuating
4 1 4 1 Fluctuating
5 1 5 0 Fluctuating
6 1 6 0 Fluctuating
7 1 7 0 Fluctuating
8 1 8 1 Fluctuating
9 1 9 1 Fluctuating
10 1 10 1 Fluctuating
11 2 1 1 Fluctuating
12 2 2 1 Fluctuating
13 2 3 1 Fluctuating
14 2 4 1 Fluctuating
15 2 5 0 Fluctuating
16 2 6 -1 Fluctuating
17 2 7 -1 Fluctuating
18 2 8 1 Fluctuating
19 2 9 1 Fluctuating
20 2 10 1 Fluctuating
21 3 1 1 Not fluctuating
22 3 2 0 Not fluctuating
23 3 3 1 Not fluctuating
24 3 4 0 Not fluctuating
25 3 5 1 Not fluctuating
26 3 6 0 Not fluctuating
27 3 7 1 Not fluctuating
28 3 8 0 Not fluctuating
29 3 9 1 Not fluctuating
30 3 10 0 Not fluctuating