R-两个数据表的匹配值索引
这是我在StackOverflow的第一篇帖子。我在编程方面相对来说是个新手,因为它在速度方面的声誉,我尝试使用R中的data.table 我有一个非常大的data.table,名为“Actions”,有5列,可能有几百万行。列名为k1、k2、i、l1和l2。我还有另一个data.table,在k1和k2列中有唯一的动作值,名为“States” 对于Actions中的每一行,我希望找到第4列和第5列的唯一索引,并与状态匹配。可复制代码如下所示:R-两个数据表的匹配值索引,r,data.table,apply,R,Data.table,Apply,这是我在StackOverflow的第一篇帖子。我在编程方面相对来说是个新手,因为它在速度方面的声誉,我尝试使用R中的data.table 我有一个非常大的data.table,名为“Actions”,有5列,可能有几百万行。列名为k1、k2、i、l1和l2。我还有另一个data.table,在k1和k2列中有唯一的动作值,名为“States” 对于Actions中的每一行,我希望找到第4列和第5列的唯一索引,并与状态匹配。可复制代码如下所示: S.disc <- c(2000,2000)
S.disc <- c(2000,2000)
S.max <- c(6200,2300)
S.min <- c(700,100)
Traces.num <- 3
Class.str <- lapply(1:2,function(x) seq(S.min[x],S.max[x],S.disc[x]))
Class.inf <- seq_len(Traces.num)
Actions <- data.table(expand.grid(Class.inf, Class.str[[2]], Class.str[[1]], Class.str[[2]], Class.str[[1]])[,c(5,4,1,3,2)])
setnames(Actions,c("k1","k2","i","l1","l2"))
States <- unique(Actions[,list(k1,k2,i)])
S.disc一旦掌握了键的窍门和数据表的j
表达式中可能使用的特殊符号,这就相对简单了。试试这个
# First make an ID for each row for use in the `dcast`
# because you are going to have multiple rows with the
# same key values and you need to know where they came from
Actions[ , ID := 1:.N ]
# Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )
# Join States to Actions, using '.I', which
# is the row locations in States in which the
# key of Actions are found and within each
# group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
# k1 k2 ID Ind Row
#1: 700 100 1 1 1
#2: 700 100 1 2 2
#3: 700 100 1 3 3
#4: 700 100 2 1 1
#5: 700 100 2 2 2
#6: 700 100 2 3 3
# reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
# Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1: 1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16 1 1 1 4 4 4 7 7 7...
#2: 2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17 2 2 2 5 5 5 8 8 8...
#3: 3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18 3 3 3 6 6 6 9 9 9...
#首先为“dcast”中使用的每一行创建一个ID`
#因为您将有多个具有
#相同的键值,您需要知道它们来自何处
操作[,ID:=1:.N]
#设置要连接的键
setkeyv(动作,c(“l1”、“l2”))
setkeyv(状态,c(“k1”、“k2”))
#使用“.I”将状态连接到操作,其中
#是处于以下状态的行位置:
#行动的关键是在每个
#将行号分组(1:.N-重复的1,2,3)
新的+1为一个可复制的例子,让我想了很多!
# First make an ID for each row for use in the `dcast`
# because you are going to have multiple rows with the
# same key values and you need to know where they came from
Actions[ , ID := 1:.N ]
# Set the keys to join on
setkeyv( Actions , c("l1" , "l2" ) )
setkeyv( States , c("k1" , "k2" ) )
# Join States to Actions, using '.I', which
# is the row locations in States in which the
# key of Actions are found and within each
# group the row number ( 1:.N - a repeating 1,2,3)
New <- States[ J(Actions) , list( ID , Ind = .I , Row = 1:.N ) ]
# k1 k2 ID Ind Row
#1: 700 100 1 1 1
#2: 700 100 1 2 2
#3: 700 100 1 3 3
#4: 700 100 2 1 1
#5: 700 100 2 2 2
#6: 700 100 2 3 3
# reshape using 'dcast.data.table'
dcast.data.table( Row ~ ID , data = New , value.var = "Ind" )
# Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27...
#1: 1 1 1 1 4 4 4 7 7 7 10 10 10 13 13 13 16 16 16 1 1 1 4 4 4 7 7 7...
#2: 2 2 2 2 5 5 5 8 8 8 11 11 11 14 14 14 17 17 17 2 2 2 5 5 5 8 8 8...
#3: 3 3 3 3 6 6 6 9 9 9 12 12 12 15 15 15 18 18 18 3 3 3 6 6 6 9 9 9...