Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/73.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
与R中的data.table快速合并(…,all=TRUE)_R_Data.table - Fatal编程技术网

与R中的data.table快速合并(…,all=TRUE)

与R中的data.table快速合并(…,all=TRUE),r,data.table,R,Data.table,是否可以使用data.table语法(如X[Y])等效于merge(…,all=TRUE) 具体而言,我需要一种非常快速的方法来获得以下结果: item_length = data.table(index = 1:10, length = c(2,5,4,6,3),key ="index") item_weigth = data.table(index = c(2,4,6,7,8,11), weight= c(.3,.5,.2), key = "index") merge(x2,y2, all

是否可以使用data.table语法(如X[Y])等效于merge(…,all=TRUE)

具体而言,我需要一种非常快速的方法来获得以下结果:

item_length = data.table(index = 1:10, length =  c(2,5,4,6,3),key ="index")
item_weigth = data.table(index = c(2,4,6,7,8,11), weight= c(.3,.5,.2), key = "index")
merge(x2,y2, all=TRUE)
即:

> merge(item_length ,item_weigth , all=TRUE)
      index length weight
[1,]     1      2     NA
[2,]     2      5    0.3
[3,]     3      4     NA
[4,]     4      6    0.5
[5,]     5      3     NA
[6,]     6      2    0.2
[7,]     7      5    0.3
[8,]     8      4    0.5
[9,]     9      6     NA
[10,]    10      3     NA
[11,]    11     NA    0.2

很抱歉回答我自己的问题,但我认为这值得分享:

一个非常快速的解决方案似乎是更新到最新版本的data.table(1.8.0)。 (非常感谢你,马修!)

以下是我的测试数据和基准测试结果:

使用data.table:

full_index <- 1:5000000
ratio_in_samples <- 0.8
x <- data.table(index = sample(full_index, length(full_index)*ratio_in_samples), 
                var1 = rnorm(length(full_index)*ratio_in_samples),
                key = "index")

y <- data.table(index = sample(full_index, length(full_index)*ratio_in_samples), 
                var2 = rnorm(length(full_index)*ratio_in_samples),
                key = "index")

system.time(
result <- merge(x,y, all=TRUE)
)
user  system elapsed 
5.05    0.55    5.62
鉴于data.frame:

full_index <- 1:5000000
ratio_in_samples <- 0.8
x <- data.frame(index = sample(full_index, length(full_index)*ratio_in_samples), 
                var1 = rnorm(length(full_index)*ratio_in_samples))

y <- data.frame(index = sample(full_index, length(full_index)*ratio_in_samples), 
                var2 = rnorm(length(full_index)*ratio_in_samples))

system.time(
  result <- merge(x,y, all=TRUE)
)
user  system elapsed 
78.83    1.75   80.67 

merge.data.table
应该非常快。你能提供一些时间安排吗?我们在最新版本中提高了它的速度。您使用的是哪个版本的
data.table
?好的,我已经更新到了最新版本1.8.0,它实际上非常快!谢谢!