Python无酸洗并行计算

Python无酸洗并行计算,python,multithreading,parallel-processing,python-multiprocessing,joblib,Python,Multithreading,Parallel Processing,Python Multiprocessing,Joblib,我有一个非常简单的列表理解,我想将其并行化: nlp = spacy.load(model) texts = sorted(X['text']) # TODO: Parallelize docs = [nlp(text) for text in texts] 但是,当我尝试使用多处理模块中的池时,如下所示: docs = Pool().map(nlp, texts) 它给了我以下错误: Traceback (most recent call last): File "main.py",

我有一个非常简单的列表理解,我想将其并行化:

nlp = spacy.load(model)
texts = sorted(X['text'])
# TODO: Parallelize
docs = [nlp(text) for text in texts]
但是,当我尝试使用多处理模块中的池时,如下所示:

docs = Pool().map(nlp, texts)
它给了我以下错误:

Traceback (most recent call last):
  File "main.py", line 117, in <module>
    main()
  File "main.py", line 99, in main
    docs = parse_docs(X)
  File "main.py", line 81, in parse_docs
    docs = Pool().map(nlp, texts)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "C:\Users\james\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
AttributeError: Can't pickle local object 'FeatureExtracter.<locals>.feature_extracter_fwd'

这也不起作用。

很可能不起作用。您可能正在尝试共享较低级别的不安全内容,以便跨进程共享,例如,具有打开的文件描述符的内容。关于为什么它不可拾取,他们含糊其辞地说这是出于类似的原因。为什么不在每个进程中分别加载nlp


这里还有更多的问题,似乎是他们正在解决的spacy的一个普遍问题:

一个解决方法可以如下所示

texts = ["Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season.",
        "The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24\u201310 to earn their third Super Bowl title.",
        "The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California.",
        "As this was the 50th Super Bowl, the league emphasized the"]

def init():
    global nlp
    nlp = spacy.load('en')

def func(text):
    global nlp
    return nlp(text)

with mp.Pool(initializer=init) as pool:
    docs = pool.map(func, texts)
哪个输出

for doc in docs:
    print(list(w.text for w in doc))

['Super', 'Bowl', '50', 'was', 'an', 'American', 'football', 'game', 'to', 'determine', 'the', 'champion', 'of', 'the', 'National', 'Football', 'League', '(', 'NFL', ')', 'for', 'the', '2015', 'season', '.']
['The', 'American', 'Football', 'Conference', '(', 'AFC', ')', 'champion', 'Denver', 'Broncos', 'defeated', 'the', 'National', 'Football', 'Conference', '(', 'NFC', ')', 'champion', 'Carolina', 'Panthers', '24–10', 'to', 'earn', 'their', 'third', 'Super', 'Bowl', 'title', '.']
['The', 'game', 'was', 'played', 'on', 'February', '7', ',', '2016', ',', 'at', 'Levi', "'s", 'Stadium', 'in', 'the', 'San', 'Francisco', 'Bay', 'Area', 'at', 'Santa', 'Clara', ',', 'California', '.']
['As', 'this', 'was', 'the', '50th', 'Super', 'Bowl', ',', 'the', 'league', 'emphasized', 'the']

谢谢你提供的链接。我了解到spaCy使用dill模块对对象进行pickle处理,因此为了避免pickle错误,我将multiprocessing_on_dill作为multiprocessingAh导入,因此spaCy 2现在已经推出,非常新。我以为你在用spacy 1。美好的
for doc in docs:
    print(list(w.text for w in doc))

['Super', 'Bowl', '50', 'was', 'an', 'American', 'football', 'game', 'to', 'determine', 'the', 'champion', 'of', 'the', 'National', 'Football', 'League', '(', 'NFL', ')', 'for', 'the', '2015', 'season', '.']
['The', 'American', 'Football', 'Conference', '(', 'AFC', ')', 'champion', 'Denver', 'Broncos', 'defeated', 'the', 'National', 'Football', 'Conference', '(', 'NFC', ')', 'champion', 'Carolina', 'Panthers', '24–10', 'to', 'earn', 'their', 'third', 'Super', 'Bowl', 'title', '.']
['The', 'game', 'was', 'played', 'on', 'February', '7', ',', '2016', ',', 'at', 'Levi', "'s", 'Stadium', 'in', 'the', 'San', 'Francisco', 'Bay', 'Area', 'at', 'Santa', 'Clara', ',', 'California', '.']
['As', 'this', 'was', 'the', '50th', 'Super', 'Bowl', ',', 'the', 'league', 'emphasized', 'the']