Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/performance/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
按组在另一个data.frame中从一个data.frame高效查找数据_R_Performance_Dataframe_Data.table_Lookup - Fatal编程技术网

按组在另一个data.frame中从一个data.frame高效查找数据

按组在另一个data.frame中从一个data.frame高效查找数据,r,performance,dataframe,data.table,lookup,R,Performance,Dataframe,Data.table,Lookup,我正在寻找以下问题的更快解决方案 假设我有以下两个数据集 df1 <- data.frame(Var1 = c(5011, 2484, 4031, 1143, 7412), Var2 = c(2161, 2161, 2161, 2161, 8595)) df2 <- data.frame(team=c("A","A", "B", "B", "B", "C", "C", "D", "D"), class=c("5011", "21

我正在寻找以下问题的更快解决方案

假设我有以下两个数据集

df1 <- data.frame(Var1 = c(5011, 2484, 4031, 1143, 7412),
              Var2 = c(2161, 2161, 2161, 2161, 8595))
df2 <- data.frame(team=c("A","A", "B", "B", "B", "C", "C", "D", "D"),
              class=c("5011", "2161", "2484", "4031", "1143", "2161", "5011", "8595", "1143"),
              attribute=c("X1", "X2", "X1", "Z1", "Z2", "Y1", "X1", "Z1", "X2"),
              stringsAsFactors=FALSE)


> df1
  Var1 Var2
1 5011 2161
2 2484 2161
3 4031 2161
4 1143 2161
5 7412 8595

> df2
  team class attribute
1    A  5011        X1
2    A  2161        X2
3    B  2484        X1
4    B  4031        Z1
5    B  1143        Z2
6    C  2161        Y1
7    C  5011        X1
8    D  8595        Z1
9    D  1143        X2
原始数据由df1和df2中的数百万行组成


如何更有效地做到这一点?也许是通过结合data.table的应用方法?

不太确定您的规则试图实现什么

根据示例数据、代码和输出,您可能希望先按df1的每一列进行联接,然后再对2个结果进行内部联接:

library(data.table)
setDT(df1)
setDT(df2)[, cls := as.integer(cls)]

#left join df1 with df2 using Var1
v1 <- df2[df1, on=.(cls=Var1)]

#left join df1 with df2 using Var2
v2 <- df2[df1, on=.(cls=Var2)]

#inner join the 2 previous results to ensure that the same team is picked 
#where classes already match in v1 and v2
v1[v2, on=.(team, cls=Var1, Var2=cls), nomatch=0L]

为什么不合并或加入?你的预期产出是多少?mergedf1,df2,by.x=Var1,by.y=class。你能澄清为什么B不匹配吗?看起来应该是这样。你能更清楚一点你的预期结果吗?也许可以提供更多的案例,说明结果中会包含哪些内容,而不会包含哪些内容。2484、4031和1143都出现在df1中。你说B在df1中出现的类中不满足是什么意思?那么如果B是24849992161呢?他们是入职还是出职?-团队B在df1中形成一行的类中不符合要求-因此不符合标准此编辑对于您提供的数据仍然不清楚。包含/排除的规则是什么?谢谢!这是非常有效的,并且比原始代码快得多。
library(data.table)
setDT(df1)
setDT(df2)[, cls := as.integer(cls)]

#left join df1 with df2 using Var1
v1 <- df2[df1, on=.(cls=Var1)]

#left join df1 with df2 using Var2
v2 <- df2[df1, on=.(cls=Var2)]

#inner join the 2 previous results to ensure that the same team is picked 
#where classes already match in v1 and v2
v1[v2, on=.(team, cls=Var1, Var2=cls), nomatch=0L]
   team  cls attribute Var2 i.attribute
1:    A 5011        X1 2161          X2
2:    C 5011        X1 2161          Y1