Python NLTK-修改多处理的嵌套for循环_Python_Loops_Nlp_Nltk_Python Multiprocessing

Python NLTK-修改多处理的嵌套for循环

python loops nlp

Python NLTK-修改多处理的嵌套for循环,python,loops,nlp,nltk,python-multiprocessing,Python,Loops,Nlp,Nltk,Python Multiprocessing,目前，我有一个嵌套的for循环来修改列表。我试图在使用多处理时创建相同的输出我现在的代码是 for test in test_data: output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in test]) 其中test_数据是生成器对象，model.score来自NLTK包我找到并尝试过的所有解决方案都不起作用（至少在我的情况下）有没有办法通过多处理

目前，我有一个嵌套的for循环来修改列表。我试图在使用多处理时创建相同的输出

我现在的代码是

for test in test_data:
    output.append([((ngram[-1], ngram[:-1],model.score(ngram[-1], ngram[:-1])) for ngram in 
    test])

其中test_数据是生成器对象，model.score来自NLTK包

我找到并尝试过的所有解决方案都不起作用（至少在我的情况下）

有没有办法通过多处理获得相同的输出？

说到多处理，我认为最简单的方法是使用

joblib

包。。。要使用这个包，您只需创建一个函数，该函数接受生成器的一项并返回一项的结果

在您的情况下，看起来是这样的：

from joblib import Parallel, delayed

def func(test):
    return [((ngram[-1], ngram[:-1], model.score(ngram[-1], ngram[:-1])) for ngram in test]


output = Parallel(n_jobs=4, backend="threading")(
            delayed(func)(test) \
                for test in test_data)

现在，

output

是您正在搜索的输出。你可以随意改变工作的数量。但是，我建议将其设置为

multiprocessing.cpu\u count（）

，在我的例子中，这是

您还可以查看更多示例。

谢谢Anwarvic。我意识到，我以前创建的多处理函数是正确的（也是你的！），问题是NLTK包仍然产生缓慢的结果。在上下文中，我在大约3亿n-gram的范围内运行该函数。