Apache spark 如何指定saveAsTable将文件保存到的路径？_Apache Spark_Pyspark_Apache Spark Sql

Apache spark 如何指定saveAsTable将文件保存到的路径？

apache-spark pyspark

Apache spark 如何指定saveAsTable将文件保存到的路径？,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,我正在尝试使用DataFrameWriter将数据帧保存到Spark1.4中pyspark中的S3 df = sqlContext.read.format("json").load("s3a://somefile") df_writer = pyspark.sql.DataFrameWriter(df) df_writer.partitionBy('col1')\ .saveAsTable('test_table', format='parquet', mode='overwr

我正在尝试使用DataFrameWriter将数据帧保存到Spark1.4中pyspark中的S3

df = sqlContext.read.format("json").load("s3a://somefile")
df_writer = pyspark.sql.DataFrameWriter(df)
df_writer.partitionBy('col1')\
         .saveAsTable('test_table', format='parquet', mode='overwrite')

拼花文件转到“/tmp/hive/warehouse/…”，这是我的驱动程序上的本地tmp目录

我确实将hive-site.xml中的hive.metastore.warehouse.dir设置为“s3a://..”位置，但spark似乎不尊重我的hive-warehouse设置。

使用

path

df_writer.partitionBy('col1')\
         .saveAsTable('test_table', format='parquet', mode='overwrite',
                      path='s3a://bucket/foo')

您可以使用

insertInto（tablename）

覆盖现有表，因为

1.4

它使用“column name=“like s3a://bucket/foo/col1=1/，s3a://bucket/foo/col1=2/，s3a://bucket/foo/col1=3/，。。。。。有没有办法避免附加列名？像s3a://bucket/foo/1/，s3a://bucket/foo/2/