R SQL查询列表中的数据帧_R_List_Dataframe_Data.table_Sqldf

R SQL查询列表中的数据帧

r list dataframe

R SQL查询列表中的数据帧,r,list,dataframe,data.table,sqldf,R,List,Dataframe,Data.table,Sqldf,给定数据帧 df1 <- data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3))) df2 <- data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1))) df1如果将data.frames从列表复制到新环境，则可以使用envir参数来sqldf，或者通过命名列表的元素，并使用和请注意以下几点：我使用li

给定数据帧

df1 <- data.frame(CustomerId=c(1:6),Product=c(rep("Toaster",3),rep("Radio",3)))
df2 <- data.frame(CustomerId=c(2,4,6),State=c(rep("Alabama",2),rep("Ohio",1)))

df1如果将data.frames从列表复制到新环境，则可以使用envir
参数来sqldf
，或者通过命名列表的元素，并使用和
请注意以下几点：

我使用list
而不是c
创建dflist

注意区别
str(c(df1,df2))
##List of 4
## $ CustomerId: int [1:6] 1 2 3 4 5 6
## $ Product   : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ CustomerId: num [1:3] 2 4 6
## $ State     : Factor w/ 2 levels "Alabama","Ohio": 1 1 2

str(list(df1,df2))
##List of 2
## $ :'data.frame': 6 obs. of  2 variables:
##  ..$ CustomerId: int [1:6] 1 2 3 4 5 6
##  ..$ Product   : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ :'data.frame': 3 obs. of  2 variables:
##  ..$ CustomerId: num [1:3] 2 4 6
##  ..$ State     : Factor w/ 2 levels "Alabama","Ohio": 1 1 2


我已经调整了sql查询以反映data.frames中的名称（按照您的第二种方法）

命名数据
另一种使用proto

感谢@G.Grothendieck（参见评论

这将使用加载了sqldf

dflist <- list(a = df1, b = df2)
sqldf( "select a.CustomerId, a.Product, b.State from df1 a 
         inner join df2 b on b.CustomerId = a.CustomerId", 
         envir = as.proto(dflist))

我已经演示了如何使用assign
和mapply
来代替for循环
str(c(df1,df2))
##List of 4
## $ CustomerId: int [1:6] 1 2 3 4 5 6
## $ Product   : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ CustomerId: num [1:3] 2 4 6
## $ State     : Factor w/ 2 levels "Alabama","Ohio": 1 1 2

str(list(df1,df2))
##List of 2
## $ :'data.frame': 6 obs. of  2 variables:
##  ..$ CustomerId: int [1:6] 1 2 3 4 5 6
##  ..$ Product   : Factor w/ 2 levels "Radio","Toaster": 2 2 2 1 1 1
## $ :'data.frame': 3 obs. of  2 variables:
##  ..$ CustomerId: num [1:3] 2 4 6
##  ..$ State     : Factor w/ 2 levels "Alabama","Ohio": 1 1 2

dflist <- list(df1,df2)
names(dflist) <- c('df1','df2')

# create a new environment

e <- new.env()
# assign the elements of dflist to this new environment
for(.x in names(dflist)){
  assign(value = dflist[[.x]], x=.x, envir = e)
}

# this could also be done using mapply / lapply
# eg
# invisible(mapply(assign, value = dflist, x = names(dflist), MoreArgs =list(envir = e)))
# run the sql query
sqldf("select a.CustomerId, a.Product, b.State from df1 a
          inner join df2 b on b.CustomerId = a.CustomerId", envir = e)

##  CustomerId Product   State
## 1          2 Toaster Alabama
## 2          4   Radio Alabama
## 3          6   Radio    Ohio

# this is far simpler!!
with(dflist,sqldf("select a.CustomerId, a.Product, b.State from df1 a
           inner join df2 b on b.CustomerId = a.CustomerId"))

dflist <- list(a = df1, b = df2)
sqldf( "select a.CustomerId, a.Product, b.State from df1 a 
         inner join df2 b on b.CustomerId = a.CustomerId", 
         envir = as.proto(dflist))

library(data.table)
dflist <- list(data.table(df1),data.table(df2))
names(dflist) <- c('df1','df2')
invisible(lapply(dflist, setkeyv, 'CustomerId'))
with(dflist, df1[df2])
##    CustomerId Product   State
## 1:          2 Toaster Alabama
## 2:          4   Radio Alabama
## 3:          6   Radio    Ohio