与other data.frame中的行匹配的列表列元素

与other data.frame中的行匹配的列表列元素,r,dataframe,match,R,Dataframe,Match,我有以下两个data.frames: df1 <- data.frame(Var1=c(3,4,8,9), Var2=c(11,32,1,7)) > df1 Var1 Var2 1 3 11 2 4 32 3 8 1 4 9 7 df2 <- data.frame(ID=c('A', 'B', 'C'), ball=I(list(c("3","11", "12"),

我有以下两个data.frames:

df1 <- data.frame(Var1=c(3,4,8,9),
               Var2=c(11,32,1,7))

> df1
  Var1 Var2
1    3   11
2    4   32
3    8    1
4    9    7

df2 <- data.frame(ID=c('A', 'B', 'C'),
                ball=I(list(c("3","11", "12"), c("4","1"), c("9","32"))))

> df2
  ID      ball
1  A 3, 11, 12
2  B      4, 1
3  C     9, 32
有人知道如何有效地做到这一点吗?原始数据由两个data.frames中的数百万行组成。

data.table解决方案的运行速度比基本的R解决方案快得多,但这是一种可能性

您的数据:

df1 <- data.frame(Var1=c(3,4,8,9),
                  Var2=c(11,32,1,7))
df2 <- data.frame(ID=c('A', 'B', 'C'),
                  ball=I(list(c("3","11", "12"), c("4","1"), c("9","32"))))
过程:

df2$ID <- as.character(df2$ID) # just in case they are levels instead

n <- length(df2)# initialize the size of df3 to be big enough
df3 <- data.frame(ID = character(n),
                  Var1 = numeric(n), Var2 = numeric(n), 
                  stringsAsFactors = F) # to make sure we get the ID as a string
count = 0 # counter
for(i in 1:nrow(df1)){
  for(j in 1:nrow(df2)){
    if(all(df1[i,] %in% df2$ball[[j]])){
      count = count + 1
      df3$ID[count] <- df2$ID[j]
      df3$Var1[count] <- df1$Var1[i]
      df3$Var2[count] <- df1$Var2[i]
    }
  }
}
df3_final <- df3[-which(df3$ID == ""),] # since we overestimated the size of d3
df3_final

你能说一下maxsapplydf2$球的长度是多少吗是否可以有多场比赛?df2$球最多可以有45个元素。这些元素可以与df1中的多行匹配。这有用吗?我假设df1只有2个变量。这很好用。我把这个问题留了一会儿,看看其他人是否也有想法。非常感谢。奇怪的是,在大型dfs上运行这个需要多长时间?我还没有在operational data.set上尝试过。但在60000多行数据集上,在我的机器上需要220秒。
df2$ID <- as.character(df2$ID) # just in case they are levels instead

n <- length(df2)# initialize the size of df3 to be big enough
df3 <- data.frame(ID = character(n),
                  Var1 = numeric(n), Var2 = numeric(n), 
                  stringsAsFactors = F) # to make sure we get the ID as a string
count = 0 # counter
for(i in 1:nrow(df1)){
  for(j in 1:nrow(df2)){
    if(all(df1[i,] %in% df2$ball[[j]])){
      count = count + 1
      df3$ID[count] <- df2$ID[j]
      df3$Var1[count] <- df1$Var1[i]
      df3$Var2[count] <- df1$Var2[i]
    }
  }
}
df3_final <- df3[-which(df3$ID == ""),] # since we overestimated the size of d3
df3_final