R 如果多个组满足某些条件,如何更改后续行值?
我有一个如下所示的数据帧:R 如果多个组满足某些条件,如何更改后续行值?,r,dataframe,R,Dataframe,我有一个如下所示的数据帧: ID value condition A 0 0 A 3 0 A 0 1 A 7 1 A 5 0 A 5 0 A 5 0 A 7 0 B 6 0 B 2 1 B 7 0 B 10 1 B 0 0 B 6
ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0
我想在满足条件时更改ID名称,并更改后面的ID名称。每个ID可以多次满足该条件,因此我希望每次都对其进行修改
结果将更改原始ID或仅添加一个新列:
ID value condition newID
A 0 0 A
A 3 0 A
A 0 1 A1
A 7 1 A1
A 5 0 A2
A 5 0 A2
A 5 0 A2
A 7 0 A2
B 6 0 B
B 2 1 B1
B 7 0 B2
B 10 1 B3
B 0 0 B4
B 6 0 B4
按“ID”分组后的一个选项是,使用
rleid
(从data.table
)创建索引,并在
library(dplyr)
library(data.table)
df1 %>%
group_by(ID) %>%
mutate(newID = rleid(condition)-1,
newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups: ID [2]
# ID value condition newID
# <chr> <int> <int> <chr>
# 1 A 0 0 A
# 2 A 3 0 A
# 3 A 0 1 A1
# 4 A 7 1 A1
# 5 A 5 0 A2
# 6 A 5 0 A2
# 7 A 5 0 A2
# 8 A 7 0 A2
# 9 B 6 0 B
#10 B 2 1 B1
#11 B 7 0 B2
#12 B 10 1 B3
#13 B 0 0 B4
#14 B 6 0 B4
库(dplyr)
库(数据表)
df1%>%
分组依据(ID)%>%
变异(newID=rleid(条件)-1,
newID=case_当(newID==0~first(ID),TRUE~paste0(first(ID),newID)))
#一个tibble:14x4
#组别:ID[2]
#ID值条件newID
#
#1A00A
#2 A 3 0 A
#3 A 0 1 A1
#4 A 7 1 A1
#5A50A2
#6A50A2
#7 A 5 0 A2
#8 A 7 0 A2
#9B 60B
#10 B 2 1 B1
#11 B 7 0 B2
#12 B 10 1 B3
#13 b0 B4
#14 B 6 0 B4
数据
df1也可以做:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
newID = ifelse(newID != 0, paste0(ID, newID), ID))
输出:
# A tibble: 14 x 4
# Groups: ID [2]
ID value condition newID
<chr> <int> <int> <chr>
1 A 0 0 A
2 A 3 0 A
3 A 0 1 A1
4 A 7 1 A1
5 A 5 0 A2
6 A 5 0 A2
7 A 5 0 A2
8 A 7 0 A2
9 B 6 0 B
10 B 2 1 B1
11 B 7 0 B2
12 B 10 1 B3
13 B 0 0 B4
14 B 6 0 B4
#一个tible:14 x 4
#组别:ID[2]
ID值条件newID
1A00A
2 A 3 0 A
3 A 0 1 A1
4 A 7 1 A1
5A50A2
6A50A2
7 A 5 0 A2
8 A 7 0 A2
9B 60B
10 B 2 1 B1
11 B 7 0 B2
12 B 10 1 B3
13 b0 B4
14 B 6 0 B4
与@akrun的想法相同,但只使用data.table
library(data.table)
setDT(df)
df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
# ID value condition newID
# 1: A 0 0 A
# 2: A 3 0 A
# 3: A 0 1 A1
# 4: A 7 1 A1
# 5: A 5 0 A2
# 6: A 5 0 A2
# 7: A 5 0 A2
# 8: A 7 0 A2
# 9: B 6 0 B
# 10: B 2 1 B1
# 11: B 7 0 B2
# 12: B 10 1 B3
# 13: B 0 0 B4
# 14: B 6 0 B4
如果我理解正确,OP希望为每个连续的条件
在每个ID
内创建子组
不幸的是,OP要求以特殊的方式命名子组,这使得解决方案过于复杂。根据OP的请求,子组将被命名,例如,A、A1、A2
,这意味着子组编号和子组名称将被一个关闭,例如,第二个子组命名为A1
,第三个子组命名为A2
,等等
如果可以接受更简化的命名方案,我们可以直接受益于rleid()
函数的prefix
参数。然后,组A
的第一个子组将命名为A1
,第二个子组将命名为A2
,以此类推
dplyr
资料
库(data.table)
df
library(data.table)
setDT(df)
df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
# ID value condition newID
# 1: A 0 0 A
# 2: A 3 0 A
# 3: A 0 1 A1
# 4: A 7 1 A1
# 5: A 5 0 A2
# 6: A 5 0 A2
# 7: A 5 0 A2
# 8: A 7 0 A2
# 9: B 6 0 B
# 10: B 2 1 B1
# 11: B 7 0 B2
# 12: B 10 1 B3
# 13: B 0 0 B4
# 14: B 6 0 B4
library(dplyr)
df %>%
group_by(ID) %>%
mutate(newID = data.table::rleid(condition, prefix = first(ID)))
# A tibble: 14 x 4
# Groups: ID [2]
ID value condition newID
<chr> <int> <int> <chr>
1 A 0 0 A1
2 A 3 0 A1
3 A 0 1 A2
4 A 7 1 A2
5 A 5 0 A3
6 A 5 0 A3
7 A 5 0 A3
8 A 7 0 A3
9 B 6 0 B1
10 B 2 1 B2
11 B 7 0 B3
12 B 10 1 B4
13 B 0 0 B5
14 B 6 0 B5
library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
ID value condition newID
1: A 0 0 A1
2: A 3 0 A1
3: A 0 1 A2
4: A 7 1 A2
5: A 5 0 A3
6: A 5 0 A3
7: A 5 0 A3
8: A 7 0 A3
9: B 6 0 B1
10: B 2 1 B2
11: B 7 0 B3
12: B 10 1 B4
13: B 0 0 B5
14: B 6 0 B5
library(data.table)
df <- fread("ID value condition
A 0 0
A 3 0
A 0 1
A 7 1
A 5 0
A 5 0
A 5 0
A 7 0
B 6 0
B 2 1
B 7 0
B 10 1
B 0 0
B 6 0")