pyspark流无法使用广播变量从检查点恢复
我创建了一个pyspark流媒体,它有检查点和广播变量。首次启动成功,但当我希望它从检查点恢复时,它出现了一些错误:pyspark流无法使用广播变量从检查点恢复,pyspark,python-3.6,broadcast,checkpoint,Pyspark,Python 3.6,Broadcast,Checkpoint,我创建了一个pyspark流媒体,它有检查点和广播变量。首次启动成功,但当我希望它从检查点恢复时,它出现了一些错误: Caused by: org.apache.spark.SparkException: An exception was raised by Python: Traceback (most recent call last): File "/usr/local/python3/lib/python3.5/site-packages/pyspark/streaming
Caused by: org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):
File "/usr/local/python3/lib/python3.5/site-packages/pyspark/streaming/util.py", line 123, in loads
f, wrap_func, deserializers = self.serializer.loads(bytes(data))
File "/usr/local/python3/lib/python3.5/site-packages/pyspark/serializers.py", line 580, in loads
return pickle.loads(obj, encoding=encoding)
File "/usr/local/python3/lib/python3.5/site-packages/pyspark/broadcast.py", line 46, in _from_id
raise Exception("Broadcast variable '%s' not loaded!" % bid)
Exception: Broadcast variable '1' not loaded!
在官方文件中,它说您必须创建延迟实例化的单例实例,我创建了一个方法:
def get_broadcast_tables(self, table: str):
if table not in globals():
globals()[table] = self.sc.broadcast(table)
return globals()[table]
我将它与bc=self.get\u broadcast\u tables(“我的表”)
一起使用,但它仍然出错,我该如何处理它
我的spark版本是2.4.3和python版本3.5