R子集匹配连续块
我有一个数据帧R子集匹配连续块,r,R,我有一个数据帧 dat <- data.frame(k=c("A","A","B","B","B","A","A","A"), a=c(4,2,4,7,5,8,3,2),b=c(2,5,3,5,8,4,5,8), stringsAsFactors = F) k a b 1 A 4 2 2 A 2 5 3 B 4 3 4 B 7 5 5 B 5 8 6 A 8 4 7 A 3 5 8 A 2 8 我们可以使用data.table中的rleid来创建分组变量 library(data.
dat <- data.frame(k=c("A","A","B","B","B","A","A","A"),
a=c(4,2,4,7,5,8,3,2),b=c(2,5,3,5,8,4,5,8),
stringsAsFactors = F)
k a b
1 A 4 2
2 A 2 5
3 B 4 3
4 B 7 5
5 B 5 8
6 A 8 4
7 A 3 5
8 A 2 8
我们可以使用
data.table
中的rleid
来创建分组变量
library(data.table)
setDT(dat)[, grp := rleid(k)]
dat
# k a b grp
#1: A 4 2 1
#2: A 2 5 1
#3: B 4 3 2
#4: B 7 5 2
#5: B 5 8 2
#6: A 8 4 3
#7: A 3 5 3
#8: A 2 8 3
我们可以按“grp”分组,并使用标准的data.table方法在“grp”中执行所有操作
这里有一个创建“grp”的
base R
选项
dat$grp <- with(dat, cumsum(c(TRUE, k[-1]!= k[-length(k)])))
dat$grp啊当然!长度。谢谢!
library(data.table)
setDT(dat)[, grp := rleid(k)]
dat
# k a b grp
#1: A 4 2 1
#2: A 2 5 1
#3: B 4 3 2
#4: B 7 5 2
#5: B 5 8 2
#6: A 8 4 3
#7: A 3 5 3
#8: A 2 8 3
dat$grp <- with(dat, cumsum(c(TRUE, k[-1]!= k[-length(k)])))