python线程不能创建超过800个线程

python线程不能创建超过800个线程,python,multithreading,Python,Multithreading,下面是我的代码,我对python非常陌生。根据下面的代码,我将实际创建1000以上的多个线程。但在某个时刻,将近800个线程,我收到一条错误消息,说错误:无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中,如何实现线程池?或者至少请用一种简单的方式向我解释 #!/usr/bin/python import threading import urllib lock = threading.Lock() def g

下面是我的代码,我对python非常陌生。根据下面的代码,我将实际创建1000以上的多个线程。但在某个时刻,将近800个线程,我收到一条错误消息,说错误:无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中,如何实现线程池?或者至少请用一种简单的方式向我解释

    #!/usr/bin/python


    import threading
    import urllib

    lock = threading.Lock()

    def get_wip_info(query_str):
          try:
              temp = urllib.urlopen(query_str).read()
          except:
              temp = 'ERROR'
          return temp

    def makeURLcall(arg1, arg2, arg3, file_output, dowhat,        result) :

         url1 = "some URL call with args"
         url2 = "some URL call with args"

        if dowhat == "IN" :
             result = get_wip_info(url1)

        elif dowhat == "OUT" :
             result = get_wip_info(url2)

        lock.acquire()

        report = open(file_output, "a")
        report.writelines("%s - %s\n"%(serial, result))
        report.close()

        lock.release()

        return


    testername = "arg1"
    stationcode = "arg2"
    dowhat = "OUT"
    result = "PASS"
    file_source = "sourcefile.txt"
    file_output = "resultfile.txt"

    readfile = open(file_source, "r")
    Data = readfile.readlines()

    threads = []

    for SNs in Data :
        SNs = SNs.strip()
        print SNs
        thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
        thread.start()

        threads.append(thread)

    for thread in threads :
        thread.join()

不要实现自己的线程池,使用Python附带的线程池

在Python3上,可以使用显式地使用线程,在Python2.6及更高版本上,可以使用类似于多处理API的线程,但由线程而不是进程支持

当然,如果您需要在引用解释器的CPython中执行CPU绑定的工作,那么您应该使用正确的multiprocessing,而不是multiprocessing.dummy;Python线程对于I/O绑定的工作来说是不错的,但是对于CPU绑定的工作来说却非常糟糕

下面的代码将用multi-processing.dummy的池来代替显式使用线程,使用固定数量的工作线程,每个工作线程以尽可能快的速度一个接一个地完成任务,而不是拥有无限数量的一个作业线程。首先,由于本地I/O可能相当便宜,并且您希望同步输出,因此我们将使辅助任务返回结果数据,而不是自己将其写入,并让主线程执行对本地磁盘的写入操作,从而消除锁定的需要以及反复打开文件的需要。这将makeURLcall更改为:

现在,对于替换显式线程的代码,请使用:

import multiprocessing.dummy as mp
from contextlib import closing

# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
     open(file_output, 'w') as outf,\
     closing(mp.Pool(32)) as pool:
    # Define generator that creates tuples of arguments to pass to makeURLcall
    # We also read the file in lazily instead of using readlines, to
    # start producing results faster
    tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
    # Pulls and writes results from the workers as they become available
    outf.writelines(pool.imap_unordered(makeURLcall, tasks))

# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up

您好,非常感谢您的回复。它是否像添加multiprocessing.pool1000一样简单?这是否意味着我现在可以创建1000个线程?@BarathanR:不,显然您的系统似乎支持的线程数不超过800个左右,无论是由于ulimit还是由于没有足够的空间容纳那么多线程堆栈都是无关紧要的,这只是您的限制。使用池的目的是在上限下有固定数量的工作线程;实际上,即使对于I/O绑定的任务,您也不太可能从同时运行的800个线程中获得任何需要测试的结果,但我怀疑并行性的最大收益可能在16到128个线程之间;在某一点上,你无论如何都会使网络饱和;当他们完成一个任务时,工作线程只是抓住另一个作业并开始执行它。我将在稍后添加一些示例代码。@BarathanR:code-example-added。我不能保证它会直接运行,因为你的代码本身是不可运行的,但它应该非常接近。顺便说一句,为什么你需要那么多线程?线程具有内存和CPU上下文切换开销,这将降低性能。例如,Windows为每个线程分配1MB的堆栈。如果你请求一堆URL,你可能想考虑做异步IO。看看
import multiprocessing.dummy as mp
from contextlib import closing

# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
     open(file_output, 'w') as outf,\
     closing(mp.Pool(32)) as pool:
    # Define generator that creates tuples of arguments to pass to makeURLcall
    # We also read the file in lazily instead of using readlines, to
    # start producing results faster
    tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
    # Pulls and writes results from the workers as they become available
    outf.writelines(pool.imap_unordered(makeURLcall, tasks))

# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up