Pyspark 如何在Databricks笔记本中禁用广播?
在Databricks/PySpark中运行查询时,出现以下错误:Pyspark 如何在Databricks笔记本中禁用广播?,pyspark,databricks,Pyspark,Databricks,在Databricks/PySpark中运行查询时,出现以下错误: org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThr
org.apache.spark.SparkException: Could not execute broadcast in 300 secs. You can increase the timeout for broadcasts via spark.sql.broadcastTimeout or disable broadcast join by setting spark.sql.autoBroadcastJoinThreshold to -1
如何在Databricks笔记本中以编程方式(Python)实现这一点?我尝试了以下方法:
>>> spark.sql.autoBroadcastJoinThreshold(-1)
result:
AttributeError: 'function' object has no attribute 'autoBroadcastJoinThreshold'
>>> spark.sql.autoBroadcastJoinThreshold = -1
result:
AttributeError: 'method' object has no attribute 'autoBroadcastJoinThreshold'
也许
spark.sql.autoBroadcastJoinThreshold
是一个属性键,可以通过某种方式将该属性设置为-1,但我还没有找到任何说明如何使用Python实现这一点的文档。spark设置的集群配置页面就是可以指定该属性的地方
在执行join命令之前,我在databricks中使用了这个命令,它起了作用:
spark.conf.set("spark.sql.broadcastTimeout" ,"-1")
大家好,欢迎来到Stack Overflow!请坐飞机。感谢您的回答,但是您是否可以添加一个关于代码如何解决问题的解释?检查以获取有关如何格式化代码的信息。