Python 在数据库表中加载spark sql表_Python_Apache Spark_Pyspark_Apache Spark Sql

Python 在数据库表中加载spark sql表

python apache-spark pyspark

Python 在数据库表中加载spark sql表,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,有没有办法像我们在sql中所做的那样，将spark sql表按原样加载到数据库表中 insert into database_table select * from sparksql_table. pg_hook = PostgresHook(postgres_conn_id="ingestion_db", schema="ingestiondb") connection = pg_hook.get_conn() cursor = connection.cursor() spark =

有没有办法像我们在sql中所做的那样，将spark sql表按原样加载到数据库表中

insert into database_table select * from sparksql_table.

pg_hook = PostgresHook(postgres_conn_id="ingestion_db", schema="ingestiondb")

connection = pg_hook.get_conn()

cursor = connection.cursor()

spark = SparkSession \
    .builder \
    .appName("Spark csv schema inference") \
    .config("spark.sql.warehouse.dir", warehouse_location) \
    .enableHiveSupport() \
    .getOrCreate()\

我可以运行以下命令：

spark.sql（“从元数据表中选择*）.show（）

但不是这个：

cursor.execute（“从MetadataTable中选择*）使用此软件包打开spark shell spark shell--包org.postgresql:postgresql:42.1.1

val url = "jdbc:postgresql://localhost:5432/dbname"

 def getProperties: Properties ={
 val prop = new Properties
 prop.setProperty("user", "dbuser")
 prop.setProperty("password", "dbpassword")
 prop.setProperty("driver", "org.postgresql.Driver")
 prop
 }

val df = spark.sql("""select * from table  """)

df.write.mode("append").option("driver", "org.postgresql.Driver").jdbc(url, 
"tablename", getProperties)

之后，您可以检查postgres数据库中的表。此外，请参考spark提供的不同模式选项，并选择适合您的模式。

您可以找到与@Ghost9的答案相当的python：

使用postgres驱动程序包初始化spark会话，如下所示（检查正确版本）：

spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.jars.packages", "postgresql-42.2.5.jar") \
    .getOrCreate()

然后，您可以使用以下功能连接到jdbc：

def connect_to_sql(
        spark, df, jdbc_hostname, jdbc_port, database, data_table, username, password
):
    jdbc_url = "jdbc:postgresql://{0}:{1}/{2}".format(jdbc_hostname, jdbc_port, database)

    connection_details = {
        "user": username,
        "password": password,
        "driver": "org.postgresql.Driver",
    }

    df.write.jdbc(url=jdbc_url, table=data_table, mode="append", properties=connection_details)

您可以找到不同的模式：

    append: Append contents of this :class:DataFrame to existing data.
    overwrite: Overwrite existing data.
    ignore: Silently ignore this operation if data already exists.
    error (default case): Throw an exception if data already exists.