Python 我的dask代码似乎在多线程模式下工作,但在多处理模式下失败

Python 我的dask代码似乎在多线程模式下工作,但在多处理模式下失败,python,parallel-processing,dask,Python,Parallel Processing,Dask,当我尝试使用dask执行groupby时,代码失败: other_df = ddf.groupby( by=[self.phone_field, self.state_field]).\ apply(lambda x: self. obtain_cluster_nos_weighted_levenshtein(x.copy()),

当我尝试使用dask执行groupby时,代码失败:

other_df = ddf.groupby(
            by=[self.phone_field, self.state_field]).\
                    apply(lambda x: self.
                        obtain_cluster_nos_weighted_levenshtein(x.copy()),
                            meta={self.address_id_field: "f8",
                                  self.add_clust_field: "i8"}).compute(scheduler='processes')
这是回溯:

  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/dask/local.py", line 461, in fire_task
    dumps((dsk[key], data)),
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 102, in dumps
    cp.dump(obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 563, in dump
    return Pickler.dump(self, obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 437, in dump
    self.save(obj)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 789, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 745, in save_function
    *self._dynamic_function_reduce(obj), obj=obj
  File "/home/ec2-user/anaconda3/lib/python3.7/site-packages/cloudpickle/cloudpickle_fast.py", line 687, in _save_reduce_pickle5
    save(state)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 638, in save_reduce
    save(args)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 774, in save_tuple
    save(element)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 662, in save_reduce
    save(state)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 859, in save_dict
    self._batch_setitems(obj.items())
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 885, in _batch_setitems
    save(v)
  File "/home/ec2-user/anaconda3/lib/python3.7/pickle.py", line 524, in save
    rv = reduce(self.proto)
NotImplementedError: object proxy must define __reduce_ex__()

我猜这与分配给工人之前的酸洗有关。仅当scheduler='processes'时,此问题才会弹出。对于多线程,它执行良好。如何解决这个问题?

调用类方法进行多处理似乎不是一个好主意。我将self.acquire\u cluster\u nos\u weighted\u levenshtein声明为一个独立函数,我的问题就解决了。

调用该函数时,您实际上是在序列化实例,可能是在序列化闭包——这可能是无法pickle的任何事情之一。