Python 在数据库表中加载spark sql表
有没有办法像我们在sql中所做的那样,将spark sql表按原样加载到数据库表中Python 在数据库表中加载spark sql表,python,apache-spark,pyspark,apache-spark-sql,Python,Apache Spark,Pyspark,Apache Spark Sql,有没有办法像我们在sql中所做的那样,将spark sql表按原样加载到数据库表中 insert into database_table select * from sparksql_table. pg_hook = PostgresHook(postgres_conn_id="ingestion_db", schema="ingestiondb") connection = pg_hook.get_conn() cursor = connection.cursor() spark =
insert into database_table select * from sparksql_table.
pg_hook = PostgresHook(postgres_conn_id="ingestion_db", schema="ingestiondb")
connection = pg_hook.get_conn()
cursor = connection.cursor()
spark = SparkSession \
.builder \
.appName("Spark csv schema inference") \
.config("spark.sql.warehouse.dir", warehouse_location) \
.enableHiveSupport() \
.getOrCreate()\
我可以运行以下命令:
spark.sql(“从元数据表中选择*).show()
但不是这个:
cursor.execute(“从MetadataTable中选择*)使用此软件包打开spark shell spark shell--包org.postgresql:postgresql:42.1.1
val url = "jdbc:postgresql://localhost:5432/dbname"
def getProperties: Properties ={
val prop = new Properties
prop.setProperty("user", "dbuser")
prop.setProperty("password", "dbpassword")
prop.setProperty("driver", "org.postgresql.Driver")
prop
}
val df = spark.sql("""select * from table """)
df.write.mode("append").option("driver", "org.postgresql.Driver").jdbc(url,
"tablename", getProperties)
之后,您可以检查postgres数据库中的表。此外,请参考spark提供的不同模式选项,并选择适合您的模式。您可以找到与@Ghost9的答案相当的python: 使用postgres驱动程序包初始化spark会话,如下所示(检查正确版本):
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.jars.packages", "postgresql-42.2.5.jar") \
.getOrCreate()
然后,您可以使用以下功能连接到jdbc:
def connect_to_sql(
spark, df, jdbc_hostname, jdbc_port, database, data_table, username, password
):
jdbc_url = "jdbc:postgresql://{0}:{1}/{2}".format(jdbc_hostname, jdbc_port, database)
connection_details = {
"user": username,
"password": password,
"driver": "org.postgresql.Driver",
}
df.write.jdbc(url=jdbc_url, table=data_table, mode="append", properties=connection_details)
您可以找到不同的模式:
append: Append contents of this :class:DataFrame to existing data.
overwrite: Overwrite existing data.
ignore: Silently ignore this operation if data already exists.
error (default case): Throw an exception if data already exists.