python线程不能创建超过800个线程
下面是我的代码,我对python非常陌生。根据下面的代码,我将实际创建1000以上的多个线程。但在某个时刻,将近800个线程,我收到一条错误消息,说错误:无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中,如何实现线程池?或者至少请用一种简单的方式向我解释python线程不能创建超过800个线程,python,multithreading,Python,Multithreading,下面是我的代码,我对python非常陌生。根据下面的代码,我将实际创建1000以上的多个线程。但在某个时刻,将近800个线程,我收到一条错误消息,说错误:无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中,如何实现线程池?或者至少请用一种简单的方式向我解释 #!/usr/bin/python import threading import urllib lock = threading.Lock() def g
#!/usr/bin/python
import threading
import urllib
lock = threading.Lock()
def get_wip_info(query_str):
try:
temp = urllib.urlopen(query_str).read()
except:
temp = 'ERROR'
return temp
def makeURLcall(arg1, arg2, arg3, file_output, dowhat, result) :
url1 = "some URL call with args"
url2 = "some URL call with args"
if dowhat == "IN" :
result = get_wip_info(url1)
elif dowhat == "OUT" :
result = get_wip_info(url2)
lock.acquire()
report = open(file_output, "a")
report.writelines("%s - %s\n"%(serial, result))
report.close()
lock.release()
return
testername = "arg1"
stationcode = "arg2"
dowhat = "OUT"
result = "PASS"
file_source = "sourcefile.txt"
file_output = "resultfile.txt"
readfile = open(file_source, "r")
Data = readfile.readlines()
threads = []
for SNs in Data :
SNs = SNs.strip()
print SNs
thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
thread.start()
threads.append(thread)
for thread in threads :
thread.join()
不要实现自己的线程池,使用Python附带的线程池 在Python3上,可以使用显式地使用线程,在Python2.6及更高版本上,可以使用类似于多处理API的线程,但由线程而不是进程支持 当然,如果您需要在引用解释器的CPython中执行CPU绑定的工作,那么您应该使用正确的multiprocessing,而不是multiprocessing.dummy;Python线程对于I/O绑定的工作来说是不错的,但是对于CPU绑定的工作来说却非常糟糕 下面的代码将用multi-processing.dummy的池来代替显式使用线程,使用固定数量的工作线程,每个工作线程以尽可能快的速度一个接一个地完成任务,而不是拥有无限数量的一个作业线程。首先,由于本地I/O可能相当便宜,并且您希望同步输出,因此我们将使辅助任务返回结果数据,而不是自己将其写入,并让主线程执行对本地磁盘的写入操作,从而消除锁定的需要以及反复打开文件的需要。这将makeURLcall更改为: 现在,对于替换显式线程的代码,请使用:
import multiprocessing.dummy as mp
from contextlib import closing
# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
open(file_output, 'w') as outf,\
closing(mp.Pool(32)) as pool:
# Define generator that creates tuples of arguments to pass to makeURLcall
# We also read the file in lazily instead of using readlines, to
# start producing results faster
tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
# Pulls and writes results from the workers as they become available
outf.writelines(pool.imap_unordered(makeURLcall, tasks))
# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up
您好,非常感谢您的回复。它是否像添加multiprocessing.pool1000一样简单?这是否意味着我现在可以创建1000个线程?@BarathanR:不,显然您的系统似乎支持的线程数不超过800个左右,无论是由于ulimit还是由于没有足够的空间容纳那么多线程堆栈都是无关紧要的,这只是您的限制。使用池的目的是在上限下有固定数量的工作线程;实际上,即使对于I/O绑定的任务,您也不太可能从同时运行的800个线程中获得任何需要测试的结果,但我怀疑并行性的最大收益可能在16到128个线程之间;在某一点上,你无论如何都会使网络饱和;当他们完成一个任务时,工作线程只是抓住另一个作业并开始执行它。我将在稍后添加一些示例代码。@BarathanR:code-example-added。我不能保证它会直接运行,因为你的代码本身是不可运行的,但它应该非常接近。顺便说一句,为什么你需要那么多线程?线程具有内存和CPU上下文切换开销,这将降低性能。例如,Windows为每个线程分配1MB的堆栈。如果你请求一堆URL,你可能想考虑做异步IO。看看
import multiprocessing.dummy as mp
from contextlib import closing
# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
open(file_output, 'w') as outf,\
closing(mp.Pool(32)) as pool:
# Define generator that creates tuples of arguments to pass to makeURLcall
# We also read the file in lazily instead of using readlines, to
# start producing results faster
tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
# Pulls and writes results from the workers as they become available
outf.writelines(pool.imap_unordered(makeURLcall, tasks))
# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up