Pyspark不允许我创建bucket
Pyspark不允许我创建bucketPyspark不允许我创建bucket,pyspark,Pyspark,Pyspark不允许我创建bucket ( df .write .partitionBy('Source') .bucketBy(8,'destination') .saveAsTable('flightdata') ) AttributeError回溯(最近一次呼叫上次) 在() ---->1 df.write.bucketBy(2,“源”).saveAsTable(“表”) AttributeError:“DataFrameWriter”对象没有
(
df
.write
.partitionBy('Source')
.bucketBy(8,'destination')
.saveAsTable('flightdata')
)
AttributeError回溯(最近一次呼叫上次) 在() ---->1 df.write.bucketBy(2,“源”).saveAsTable(“表”)
AttributeError:“DataFrameWriter”对象没有属性“bucketBy”看起来只有spark 2.3.0才支持
bucketBy
您可以尝试创建一个新的bucket列
from pyspark.ml.feature import Bucketizer
bucketizer = Bucketizer(splits=[ 0, float('Inf') ],inputCol="destination", outputCol="buckets")
df_with_buckets = bucketizer.setHandleInvalid("keep").transform(df)
然后使用partitionBy(*cols)
df_与_bucket.write.partitionBy('bucket').saveAsTable(“表”)