如何使用类型提示优化PySpark toPandas()
我以前从未在PySpark中看到过此警告:如何使用类型提示优化PySpark toPandas(),pyspark,Pyspark,我以前从未在PySpark中看到过此警告: The conversion of DecimalType columns is inefficient and may take a long time. Column names: [PVPERUSER] If those columns are not necessary, you may consider dropping them or converting to primitive types before the conversion.
The conversion of DecimalType columns is inefficient and may take a long time. Column names: [PVPERUSER] If those columns are not necessary, you may consider dropping them or converting to primitive types before the conversion.
最好的处理方法是什么?这是传递到toPandas()的参数,还是需要以特定方式键入数据帧
我的代码是一个简单的Pypark对话:
df = data.toPandas()
试试这个:
df = data.select(data.PVPERUSER.cast('float'), data.another_column).toPandas()