Python 导入模块(nltk)会导致多处理挂起
我跟踪了一个python多处理难题,一直到模块(nltk)的导入。可复制(希望)代码粘贴在下面。这对我来说毫无意义,有人有什么想法吗Python 导入模块(nltk)会导致多处理挂起,python,nltk,python-requests,Python,Nltk,Python Requests,我跟踪了一个python多处理难题,一直到模块(nltk)的导入。可复制(希望)代码粘贴在下面。这对我来说毫无意义,有人有什么想法吗 from multiprocessing import Pool import time, requests #from nltk.corpus import stopwords # uncomment this and it hangs def gethtml(key, url): r = requests.get(url) return
from multiprocessing import Pool
import time, requests
#from nltk.corpus import stopwords # uncomment this and it hangs
def gethtml(key, url):
r = requests.get(url)
return r.text
def getnothing(key, url):
return "nothing"
if __name__ == '__main__':
pool = Pool(processes=4)
result = list()
nruns = 4
url = 'http://davidchao.typepad.com/webconferencingexpert/2013/08/gartners-magic-quadrant-for-cloud-infrastructure-as-a-service.html'
for i in range(0,nruns):
# print gethtml(i,url)
result.append(pool.apply_async(gethtml, [i,url]))
# result.append(pool.apply_async(getnothing, [i,url]))
pool.close()
# monitor jobs until they complete
running = nruns
while running > 0:
time.sleep(1)
running = 0
for run in result:
if not run.ready(): running += 1
print "processes still running:",running
# print results
for i,run in enumerate(result):
print i,run.get()[0:40]
请注意,“getnothing”函数可以工作。它是nltk模块导入和请求调用的组合。叹息
> python --version
Python 2.7.6
> python -c 'import sys;print("%x" % sys.maxsize, sys.maxsize > 2**32)'
('7fffffffffffffff', True)
> pip freeze | grep requests
requests==2.2.1
> pip freeze | grep nltk
nltk==2.0.4
我会将其他有类似问题的人重定向到不使用多处理模块的解决方案: 1) Apache Spark可扩展性/灵活性。然而,这似乎不是python多处理的解决方案。看起来pyspark也受到全局解释器锁的限制 2) 用于一般python异步处理的“gevent”或“twisted” 3) 异步请求的grequests