Python 从RDS数据库读取的AWS胶水'；VPC中的s_Python_Postgresql_Amazon Web Services_Amazon Rds_Aws Glue

Python 从RDS数据库读取的AWS胶水'；VPC中的s

python postgresql amazon-web-services

Python 从RDS数据库读取的AWS胶水'；VPC中的s,python,postgresql,amazon-web-services,amazon-rds,aws-glue,Python,Postgresql,Amazon Web Services,Amazon Rds,Aws Glue,我有一个RDS数据库，它位于VPC中。我的最终目标是运行夜间作业，从RDS获取数据并将其存储在红移中。我目前正在使用胶水和胶水连接。我可以通过以下线路连接到RDS/Redshift： datasource2 = DynamicFrame.fromDF(dfFinal, glueContext, "scans") output = glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource2, catal

我有一个RDS数据库，它位于VPC中。我的最终目标是运行夜间作业，从RDS获取数据并将其存储在红移中。我目前正在使用胶水和胶水连接。我可以通过以下线路连接到RDS/Redshift：

datasource2 = DynamicFrame.fromDF(dfFinal, glueContext, "scans")

output = glueContext.write_dynamic_frame.from_jdbc_conf(frame = datasource2, catalog_connection = "MPtest", connection_options = {"database" : "app", "dbtable" : "scans"})

其中dfFinal是我在一系列转换之后的最后一个数据帧，这些转换对于本文来说并不是必不可少的。该代码工作正常，但我想修改它，以便能够将表从RDS读取到数据帧中

由于RDS数据库位于VPC中，我想使用

catalog\u connection

参数，但是

DynamicFrameReader

类没有

from\u jdbc\u conf

方法，因此没有明显的方式使用我的粘合连接

我看到一些帖子说你可以使用这样的方法：

url = "jdbc:postgresql://host/dbName"
properties = {
"user" : "user",
"password" : "password"
}
df = spark.read.jdbc(url=url, table="table", properties=properties)

但当我尝试时，它会超时，因为它不是一个可公开访问的数据库。有什么建议吗？

您使用胶水连接的方法是正确的

为您的Postgres实例定义JDBC类型的粘合连接

Type    JDBC

JDBC URL    jdbc:postgresql://<RDS ip>:<RDS port>/<database_name>

VPC Id  <VPC of RDS instance>

Subnet  <subnet of RDS instance>

Security groups <Security Group allowed to connect to RDS>

使用下面的选项字典创建要从表中读取的动态帧

使用胶水连接的方法是正确的

为您的Postgres实例定义JDBC类型的粘合连接

Type    JDBC

JDBC URL    jdbc:postgresql://<RDS ip>:<RDS port>/<database_name>

VPC Id  <VPC of RDS instance>

Subnet  <subnet of RDS instance>

Security groups <Security Group allowed to connect to RDS>

使用下面的选项字典创建要从表中读取的动态帧


table_ddf = glueContext.create_dynamic_frame.from_options(

        connection_type='postgresql',

        connection_options=options,

        transformation_ctx=transformation_ctx
)