Pyspark不允许我创建bucket

Pyspark不允许我创建bucket,pyspark,Pyspark,Pyspark不允许我创建bucket ( df .write .partitionBy('Source') .bucketBy(8,'destination') .saveAsTable('flightdata') ) AttributeError回溯(最近一次呼叫上次) 在() ---->1 df.write.bucketBy(2,“源”).saveAsTable(“表”) AttributeError:“DataFrameWriter”对象没有

Pyspark不允许我创建bucket

(
    df
    .write
    .partitionBy('Source')
    .bucketBy(8,'destination')
    .saveAsTable('flightdata')

)

AttributeError回溯(最近一次呼叫上次) 在() ---->1 df.write.bucketBy(2,“源”).saveAsTable(“表”)


AttributeError:“DataFrameWriter”对象没有属性“bucketBy”

看起来只有spark 2.3.0才支持
bucketBy

您可以尝试创建一个新的bucket列

from pyspark.ml.feature import Bucketizer
bucketizer = Bucketizer(splits=[ 0, float('Inf') ],inputCol="destination", outputCol="buckets")
df_with_buckets = bucketizer.setHandleInvalid("keep").transform(df)
然后使用
partitionBy(*cols)

df_与_bucket.write.partitionBy('bucket').saveAsTable(“表”)