R 如何有效地识别数据表中跨多个列的顺序更改?
我有一个非常大的数据表,包含以下列。其中pos1和pos2给出了不同类别cat1和cat2的对齐序列R 如何有效地识别数据表中跨多个列的顺序更改?,r,data.table,grouping,sequence,R,Data.table,Grouping,Sequence,我有一个非常大的数据表,包含以下列。其中pos1和pos2给出了不同类别cat1和cat2的对齐序列 set.seed(1) library(data.table) x <- 1:60 y <- 100:41 dt <- data.table(cat1 = c(rep("A", 40), rep("B", 60)), cat2 = c(rep("A", 75), rep("
set.seed(1)
library(data.table)
x <- 1:60
y <- 100:41
dt <- data.table(cat1 = c(rep("A", 40), rep("B", 60)),
cat2 = c(rep("A", 75), rep("C", 25)),
pos1 = c(x[-sample(x, 10)], x[-sample(x, 10)]),
pos2 = c(x[-sample(x, 10)], y[-sample(x, 10)])
set.seed(1)
库(数据表)
x由于您使用的是data.table
包,因此我将执行以下操作
dt[,`:=`(pos1id=rleid(cumsum(abs(diff(c(0,pos1)))>1)),
pos2id=rleid(cumsum(abs(diff(c(0,pos2)))>1)),
cat1id=rleid(cat1),
cat2id=rleid(cat2))][
, `:=`(grp=.GRP), by = c("pos1id","pos2id",
"cat1id","cat2id")
]
:=
操作符修改数据,这通常是快速的。此外,您不需要一次性完成所有这些操作,您可以创建多个列,然后根据data.table的内部.GRP
参数对它们进行索引
dt[,`:=`(pos1id=rleid(cumsum(abs(diff(c(0,pos1)))>1)),
pos2id=rleid(cumsum(abs(diff(c(0,pos2)))>1)),
cat1id=rleid(cat1),
cat2id=rleid(cat2))][
, `:=`(grp=.GRP), by = c("pos1id","pos2id",
"cat1id","cat2id")
]