R 如果多个组满足某些条件,如何更改后续行值?

R 如果多个组满足某些条件,如何更改后续行值?,r,dataframe,R,Dataframe,我有一个如下所示的数据帧: ID value condition A 0 0 A 3 0 A 0 1 A 7 1 A 5 0 A 5 0 A 5 0 A 7 0 B 6 0 B 2 1 B 7 0 B 10 1 B 0 0 B 6

我有一个如下所示的数据帧:

ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0
我想在满足条件时更改ID名称,并更改后面的ID名称。每个ID可以多次满足该条件,因此我希望每次都对其进行修改

结果将更改原始ID或仅添加一个新列:

ID  value   condition   newID
A   0              0    A
A   3              0    A
A   0              1    A1
A   7              1    A1
A   5              0    A2
A   5              0    A2
A   5              0    A2
A   7              0    A2
B   6              0    B
B   2              1    B1
B   7              0    B2
B   10             1    B3
B   0              0    B4
B   6              0    B4

按“ID”分组后的一个选项是,使用
rleid
(从
data.table
)创建索引,并在

library(dplyr)
library(data.table)
df1 %>% 
   group_by(ID) %>%
   mutate(newID = rleid(condition)-1,
          newID = case_when(newID == 0 ~ first(ID), TRUE ~ paste0(first(ID), newID)))
# A tibble: 14 x 4
# Groups:   ID [2]
#   ID    value condition newID
#   <chr> <int>     <int> <chr>
# 1 A         0         0 A    
# 2 A         3         0 A    
# 3 A         0         1 A1   
# 4 A         7         1 A1   
# 5 A         5         0 A2   
# 6 A         5         0 A2   
# 7 A         5         0 A2   
# 8 A         7         0 A2   
# 9 B         6         0 B    
#10 B         2         1 B1   
#11 B         7         0 B2   
#12 B        10         1 B3   
#13 B         0         0 B4   
#14 B         6         0 B4   
库(dplyr)
库(数据表)
df1%>%
分组依据(ID)%>%
变异(newID=rleid(条件)-1,
newID=case_当(newID==0~first(ID),TRUE~paste0(first(ID),newID)))
#一个tibble:14x4
#组别:ID[2]
#ID值条件newID
#          
#1A00A
#2 A 3 0 A
#3 A 0 1 A1
#4 A 7 1 A1
#5A50A2
#6A50A2
#7 A 5 0 A2
#8 A 7 0 A2
#9B 60B
#10 B 2 1 B1
#11 B 7 0 B2
#12 B 10 1 B3
#13 b0 B4
#14 B 6 0 B4
数据
df1也可以做:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(newID = cumsum(c(0, (condition != lag(condition))[-1])),
         newID = ifelse(newID != 0, paste0(ID, newID), ID))
输出:

# A tibble: 14 x 4
# Groups:   ID [2]
   ID    value condition newID
   <chr> <int>     <int> <chr>
 1 A         0         0 A    
 2 A         3         0 A    
 3 A         0         1 A1   
 4 A         7         1 A1   
 5 A         5         0 A2   
 6 A         5         0 A2   
 7 A         5         0 A2   
 8 A         7         0 A2   
 9 B         6         0 B    
10 B         2         1 B1   
11 B         7         0 B2   
12 B        10         1 B3   
13 B         0         0 B4   
14 B         6         0 B4  
#一个tible:14 x 4
#组别:ID[2]
ID值条件newID
1A00A
2 A 3 0 A
3 A 0 1 A1
4 A 7 1 A1
5A50A2
6A50A2
7 A 5 0 A2
8 A 7 0 A2
9B 60B
10 B 2 1 B1
11 B 7 0 B2
12 B 10 1 B3
13 b0 B4
14 B 6 0 B4

与@akrun的想法相同,但只使用data.table

library(data.table)
setDT(df)

df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
#     ID value condition newID
#  1:  A     0         0     A
#  2:  A     3         0     A
#  3:  A     0         1    A1
#  4:  A     7         1    A1
#  5:  A     5         0    A2
#  6:  A     5         0    A2
#  7:  A     5         0    A2
#  8:  A     7         0    A2
#  9:  B     6         0     B
# 10:  B     2         1    B1
# 11:  B     7         0    B2
# 12:  B    10         1    B3
# 13:  B     0         0    B4
# 14:  B     6         0    B4

如果我理解正确,OP希望为每个连续的
条件
在每个
ID
内创建子组

不幸的是,OP要求以特殊的方式命名子组,这使得解决方案过于复杂。根据OP的请求,子组将被命名,例如,
A、A1、A2
,这意味着子组编号和子组名称将被一个关闭,例如,第二个子组命名为
A1
,第三个子组命名为
A2
,等等

如果可以接受更简化的命名方案,我们可以直接受益于
rleid()
函数的
prefix
参数。然后,组
A
的第一个子组将命名为
A1
,第二个子组将命名为
A2
,以此类推

dplyr 资料
库(data.table)
df
library(data.table)
setDT(df)

df[, newID := paste0(ID, gsub('^0$', '', rleid(condition) - 1)), ID]
df
#     ID value condition newID
#  1:  A     0         0     A
#  2:  A     3         0     A
#  3:  A     0         1    A1
#  4:  A     7         1    A1
#  5:  A     5         0    A2
#  6:  A     5         0    A2
#  7:  A     5         0    A2
#  8:  A     7         0    A2
#  9:  B     6         0     B
# 10:  B     2         1    B1
# 11:  B     7         0    B2
# 12:  B    10         1    B3
# 13:  B     0         0    B4
# 14:  B     6         0    B4
library(dplyr)
df %>% 
  group_by(ID) %>% 
  mutate(newID = data.table::rleid(condition, prefix = first(ID)))
# A tibble: 14 x 4
# Groups:   ID [2]
   ID    value condition newID
   <chr> <int>     <int> <chr>
 1 A         0         0 A1   
 2 A         3         0 A1   
 3 A         0         1 A2   
 4 A         7         1 A2   
 5 A         5         0 A3   
 6 A         5         0 A3   
 7 A         5         0 A3   
 8 A         7         0 A3   
 9 B         6         0 B1   
10 B         2         1 B2   
11 B         7         0 B3   
12 B        10         1 B4   
13 B         0         0 B5   
14 B         6         0 B5
library(data.table)
setDT(df)[, newID := rleid(condition, prefix = ID), ID][]
    ID value condition newID
 1:  A     0         0    A1
 2:  A     3         0    A1
 3:  A     0         1    A2
 4:  A     7         1    A2
 5:  A     5         0    A3
 6:  A     5         0    A3
 7:  A     5         0    A3
 8:  A     7         0    A3
 9:  B     6         0    B1
10:  B     2         1    B2
11:  B     7         0    B3
12:  B    10         1    B4
13:  B     0         0    B5
14:  B     6         0    B5
library(data.table)
df <- fread("ID  value   condition
A   0         0
A   3         0
A   0         1
A   7         1
A   5         0
A   5         0
A   5         0
A   7         0
B   6         0
B   2         1
B   7         0
B   10        1
B   0         0
B   6         0")