R 如何在由其他列定义的组中对列中的值进行索引?
我有一个数据帧:R 如何在由其他列定义的组中对列中的值进行索引?,r,dataframe,indexing,data.table,R,Dataframe,Indexing,Data.table,我有一个数据帧: dput(df) structure(list(ID = c("A1", "A1", "A1", "A1", "A1", "A1", "B2", "B2", "B2", "B2", "B2", "B2", "B2", &q
dput(df)
structure(list(ID = c("A1", "A1", "A1", "A1", "A1", "A1", "B2",
"B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2", "B2"), operation = c("open",
"open", "close", "", "open", "close", "", "open", "open", "open",
"close", "upload", "open", "close", "open", "close")), class = "data.frame", row.names = c(NA,
-16L))
我想为列操作中的每个“打开”和“关闭”包添加索引。因此,对于打开和关闭之间的每一行,必须具有相同的索引。因此,期望的结果是:
ID operation index
A1 open 1
A1 open 1
A1 close 1
A1
A1 open 2
A1 close 2
B2
B2 open 3
B2 open 3
B2 open 3
B2 close 3
B2 upload
B2 open 4
B2 close 4
B2 open 5
B2 close 5
ID operation index
A1 open 1
A1 open 1
A1 close 1
A1
A1 open 2
A1 close 2
B2
B2 open 1
B2 open 1
B2 open 1
B2 close 1
B2 upload
B2 open 2
B2 close 2
B2 open 3
B2 close 3
我是这样做的:
rank <- df$operation == "close" & !is.na(df$operation)
df$index <- cumsum(c(1, rank[-length(rank)]))
df$index[!df$operation %in% c("open", "close")] <- NA
我该怎么做呢?这里有一种使用
数据的方法。表
:
library(data.table)
setDT(df)
df[!is.na(index), index := rleid(index), by = .(ID)]
df
# ID operation index
# 1: A1 open 1
# 2: A1 open 1
# 3: A1 close 1
# 4: A1 NA
# 5: A1 open 2
# 6: A1 close 2
# 7: B2 NA
# 8: B2 open 1
# 9: B2 open 1
# 10: B2 open 1
# 11: B2 close 1
# 12: B2 upload NA
# 13: B2 open 2
# 14: B2 close 2
# 15: B2 open 3
# 16: B2 close 3
有名为
dt
或df
的内置R函数。虽然R可以区分变量和函数,但最好使用其他名称
library(data.table)
setDT(df)
df[!is.na(index), index := rleid(index), by = .(ID)]
df
# ID operation index
# 1: A1 open 1
# 2: A1 open 1
# 3: A1 close 1
# 4: A1 NA
# 5: A1 open 2
# 6: A1 close 2
# 7: B2 NA
# 8: B2 open 1
# 9: B2 open 1
# 10: B2 open 1
# 11: B2 close 1
# 12: B2 upload NA
# 13: B2 open 2
# 14: B2 close 2
# 15: B2 open 3
# 16: B2 close 3