Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/68.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/android/198.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
处理R中连接的多个表的最佳方法是什么?_R_Shiny_Dplyr_Data.table - Fatal编程技术网

处理R中连接的多个表的最佳方法是什么?

处理R中连接的多个表的最佳方法是什么?,r,shiny,dplyr,data.table,R,Shiny,Dplyr,Data.table,如果您在星形模式中有3个表。像这样的。其中我有一个虚拟表dt1,它依赖于dt2和dt3 library(data.table) 一种方法是编写这样的函数 dt1[dt2] dt2[dt1] dt1[dt3] dt3[dt1] rememberJoin您可以查看如何将所有查找表绑定在一起 Join <- list( dt1dt2 = c( x1 = "x2"), dt2dt1 = c( x2 = "x1"), dt3dt1 = c( y3 = "y1"), dt1dt3 = c( y1

如果您在星形模式中有3个表。像这样的。其中我有一个虚拟表dt1,它依赖于dt2和dt3

library(data.table)
一种方法是编写这样的函数

dt1[dt2]
dt2[dt1]
dt1[dt3]
dt3[dt1]

rememberJoin您可以查看如何将所有查找表绑定在一起

Join <- list(
dt1dt2 = c( x1 = "x2"),
dt2dt1 = c( x2 = "x1"),
dt3dt1 = c( y3 = "y1"),
dt1dt3 = c( y1 = "y3")
)

dt1[dt2, on = Join$dt1dt2]
dt2[dt1, on = Join$dt2dt1]
dt3[dt1, on = Join$dt3dt1]

如果您必须有一个函数调用,这将利用R的计算的惰性。它要求您提前提供查找表,但更符合您的问题。注意:为了减少逻辑语句,此解决方案总是对输入进行重新排序。也就是说,
rememberJoin2(dt1,dt2)
的计算结果与
rememberJoin2(dt2,dt1)
的计算结果相同,否则它看起来太乱了

dt_dummy = dt1
dt_lookup = rbindlist(list(dt2, dt3[, .(y2 = y3, x2 = x3)]), idcol = 'ID')

dt_dummy[dt_lookup, on = .(x1 = x2), nomatch = 0L]
dt_dummy[dt_lookup, on = .(y1 = x2), nomatch = 0L]
dt1\u lookup=c('x1','y1')
dtx_lookup=c('x2','y3')

还记得Join2我刚刚发现一个与同一问题相关的软件包。这里提到它只是为了参考


它记录了本地r数据帧的所有连接。必须看一下。

你能详细说明一下为什么有4种组合吗?y3匹配y1但x1匹配x2的逻辑是什么?这只是我创建的一个虚拟数据,用来显示dt1依赖于dt2和dt3。。。因此,可以有两个表以2种方式连接,即4种方式DT1[dt2];dt2[dt1];dt1[dt3];dt3[dt1];我现在想做的就是探索数据。这是我可以访问的整个数据库快照。我已经在多个CSV中导入了它们。您可以检查dwtools包,它有一个批连接功能,在data.table包中有一个FR。您可以将您的用例放在那里,这样我们就可以确保在实现该特性时不会遗漏它。
Join <- list(
dt1dt2 = c( x1 = "x2"),
dt2dt1 = c( x2 = "x1"),
dt3dt1 = c( y3 = "y1"),
dt1dt3 = c( y1 = "y3")
)

dt1[dt2, on = Join$dt1dt2]
dt2[dt1, on = Join$dt2dt1]
dt3[dt1, on = Join$dt3dt1]

dt_dummy = dt1
dt_lookup = rbindlist(list(dt2, dt3[, .(y2 = y3, x2 = x3)]), idcol = 'ID')

dt_dummy[dt_lookup, on = .(x1 = x2), nomatch = 0L]
dt_dummy[dt_lookup, on = .(y1 = x2), nomatch = 0L]
dt1_lookup = c('x1', 'y1')
dtx_lookup = c('x2', 'y3')

rememberJoin2 <- function(t1, t2){
  l = list(substitute(t1), substitute(t2))

  #extract the number from dt#
  n <- vapply(l, function(x) as.integer(gsub("dt", "", deparse(x))), 0L)

  if(n[1] == n[2]) stop('must provide different data.tables to join')

  r <- rank(n)

  eval(
    substitute(X[Y, on = .(Xkey == Ykey)],
               list(X = l[[r[1]]],
                    Y = l[[r[2]]],
                    Xkey = as.name(dt1_lookup[max(n) - 1]), 
                    Ykey = as.name(dtx_lookup[max(n) - 1])
                    )
               )
    )
}

rememberJoin2(dt1, dt2)
rememberJoin2(dt2, dt1)