Pyspark pickle.PicklingError:无法对未打开读取的文件进行pickle
我在dataproc上运行pyspark作业时遇到此错误。原因可能是什么 这是错误的堆栈跟踪Pyspark pickle.PicklingError:无法对未打开读取的文件进行pickle,pyspark,pickle,google-cloud-dataproc,Pyspark,Pickle,Google Cloud Dataproc,我在dataproc上运行pyspark作业时遇到此错误。原因可能是什么 这是错误的堆栈跟踪 File "/usr/lib/python2.7/pickle.py", line 331, in save self.save_reduce(obj=obj, *rv) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 553, in save_reduce File "/usr/l
File "/usr/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py",
line 553, in save_reduce
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
save(v)
File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py",
line 582, in save_file
pickle.PicklingError: Cannot pickle files that are not opened for reading
我发现了这个问题。我正在地图功能中使用字典。 失败的原因是:工作节点无法访问我在map函数中传递的字典 解决方案:
I broadcasted the dictionary and then used it in function (Map)
sc = SparkContext()
lookup_bc = sc.broadcast(lookup_dict)
然后在函数中,我使用以下方法获取值:
data = lookup_bc.value.get(key)
希望有帮助 我在Pypark也遇到了同样的问题@ramanand您找到了解决方案吗?是的,我在map函数中读取了一个字典,但我没有广播。所以原因是worker节点找不到该字典并抛出pickle异常。