Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/86.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Sql 在R中访问分布在多个Postgres服务器中的碎片数据_Sql_R_Postgresql_Dplyr - Fatal编程技术网

Sql 在R中访问分布在多个Postgres服务器中的碎片数据

Sql 在R中访问分布在多个Postgres服务器中的碎片数据,sql,r,postgresql,dplyr,Sql,R,Postgresql,Dplyr,我在3个不同的Postgres服务器中有3个碎片数据库,我正在尝试连接这些服务器并编写sql以返回R中的值。我可以连接并编写第一个查询,但我需要三个表中数据的结果。我该怎么办 require("RPostgreSQL") library(DBI) library('dplyr') # password pw <- "postgres" # loads the PostgreSQL driver drv <- dbDriver("PostgreSQL") # creates a

我在3个不同的Postgres服务器中有3个碎片数据库,我正在尝试连接这些服务器并编写sql以返回R中的值。我可以连接并编写第一个查询,但我需要三个表中数据的结果。我该怎么办

require("RPostgreSQL")
library(DBI)
library('dplyr')

# password
pw <- "postgres"

# loads the PostgreSQL driver
drv <- dbDriver("PostgreSQL")

# creates a connection to the postgres database
con1 <- dbConnect(
  drv,
  dbname = "postgres",
  host = "0.0.0.0",
  port = 5436,
  user = "postgres",
  password = pw
)
con2 <- dbConnect(
  drv,
  dbname = "postgres",
  host = "0.0.0.0",
  port = 5431,
  user = "postgres",
  password = pw
)
con3 <- dbConnect(
  drv,
  dbname = "postgres",
  host = "0.0.0.0",
  port = 5436,
  user = "postgres",
  password = pw
)
rm(pw) # removes the password


# check for connection
dbExistsTable(con1, "shard1")
dbExistsTable(con2, "shard2")
dbExistsTable(con3, "shard3")
# TRUE

# the amount of paid installs by company, which happened in May
query = "SELECT company, SUM(installs)
FROM shard1
WHERE paid= 'TRUE' AND to_char(created_at,'mm')='05'
GROUP BY company"
dsub = tbl(con1, sql(query))
dsub
require(“RPostgreSQL”)
图书馆(DBI)
库('dplyr')
#密码

pw只需行绑定所有生成的数据帧。由于名称以1-2-3模式更改,请在连接对象上使用
get()
,并在SQL查询中对表名使用字符串插值,两者都使用
paste0
动态引用:

# RETURN LIST DATA FRAMES FOR EACH CONNECTION
df_list <- lapply(c(1:3), function(i) {

      query <- "SELECT company, SUM(installs) AS total_installs
                FROM %s
                WHERE paid = 'TRUE' 
                  AND to_char(created_at,'mm')='05'
                GROUP BY company"

      dbGetQuery(get(paste0("con", i)), sprintf(query, paste0("shard", i)))
})

final_df <- do.call(rbind, df_list)     # BASE R CHAIN APPEND METHOD
# final_df <- bind_rows(df_list)        # DPLYR CHAIN APPEND METHOD
#返回每个连接的列表数据帧

df_列表感谢您的回答,但我得到一个错误:关系“shard1”不存在第2行:来自shard1。在con2中应该是shard2和con3 shard3,我应该如何修改它?请参阅编辑,现在SQL查询必须动态插值。sum仅限超级rbind不适用于sum。我把它改为:final_df%group_by(country)%%>%summary_all(funs(sum(,na.rm=TRUE)),非常感谢……如果数据帧列在数据帧列表中的名称完全相同,
rbind
应该可以工作。