python线程不能创建超过800个线程_Python_Multithreading

python线程不能创建超过800个线程

python multithreading

python线程不能创建超过800个线程,python,multithreading,Python,Multithreading,下面是我的代码，我对python非常陌生。根据下面的代码，我将实际创建1000以上的多个线程。但在某个时刻，将近800个线程，我收到一条错误消息，说错误：无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中，如何实现线程池？或者至少请用一种简单的方式向我解释 #!/usr/bin/python import threading import urllib lock = threading.Lock() def g

下面是我的代码，我对python非常陌生。根据下面的代码，我将实际创建1000以上的多个线程。但在某个时刻，将近800个线程，我收到一条错误消息，说错误：无法启动新线程。我确实读过一些关于threadpool的文章。我真的不明白。在我的代码中，如何实现线程池？或者至少请用一种简单的方式向我解释

    #!/usr/bin/python


    import threading
    import urllib

    lock = threading.Lock()

    def get_wip_info(query_str):
          try:
              temp = urllib.urlopen(query_str).read()
          except:
              temp = 'ERROR'
          return temp

    def makeURLcall(arg1, arg2, arg3, file_output, dowhat,        result) :

         url1 = "some URL call with args"
         url2 = "some URL call with args"

        if dowhat == "IN" :
             result = get_wip_info(url1)

        elif dowhat == "OUT" :
             result = get_wip_info(url2)

        lock.acquire()

        report = open(file_output, "a")
        report.writelines("%s - %s\n"%(serial, result))
        report.close()

        lock.release()

        return


    testername = "arg1"
    stationcode = "arg2"
    dowhat = "OUT"
    result = "PASS"
    file_source = "sourcefile.txt"
    file_output = "resultfile.txt"

    readfile = open(file_source, "r")
    Data = readfile.readlines()

    threads = []

    for SNs in Data :
        SNs = SNs.strip()
        print SNs
        thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
        thread.start()

        threads.append(thread)

    for thread in threads :
        thread.join()

不要实现自己的线程池，使用Python附带的线程池

在Python3上，可以使用显式地使用线程，在Python2.6及更高版本上，可以使用类似于多处理API的线程，但由线程而不是进程支持

当然，如果您需要在引用解释器的CPython中执行CPU绑定的工作，那么您应该使用正确的multiprocessing，而不是multiprocessing.dummy；Python线程对于I/O绑定的工作来说是不错的，但是对于CPU绑定的工作来说却非常糟糕

下面的代码将用multi-processing.dummy的池来代替显式使用线程，使用固定数量的工作线程，每个工作线程以尽可能快的速度一个接一个地完成任务，而不是拥有无限数量的一个作业线程。首先，由于本地I/O可能相当便宜，并且您希望同步输出，因此我们将使辅助任务返回结果数据，而不是自己将其写入，并让主线程执行对本地磁盘的写入操作，从而消除锁定的需要以及反复打开文件的需要。这将makeURLcall更改为：

现在，对于替换显式线程的代码，请使用：

import multiprocessing.dummy as mp
from contextlib import closing

# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
     open(file_output, 'w') as outf,\
     closing(mp.Pool(32)) as pool:
    # Define generator that creates tuples of arguments to pass to makeURLcall
    # We also read the file in lazily instead of using readlines, to
    # start producing results faster
    tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
    # Pulls and writes results from the workers as they become available
    outf.writelines(pool.imap_unordered(makeURLcall, tasks))

# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up

您好，非常感谢您的回复。它是否像添加multiprocessing.pool1000一样简单？这是否意味着我现在可以创建1000个线程？@BarathanR:不，显然您的系统似乎支持的线程数不超过800个左右，无论是由于ulimit还是由于没有足够的空间容纳那么多线程堆栈都是无关紧要的，这只是您的限制。使用池的目的是在上限下有固定数量的工作线程；实际上，即使对于I/O绑定的任务，您也不太可能从同时运行的800个线程中获得任何需要测试的结果，但我怀疑并行性的最大收益可能在16到128个线程之间；在某一点上，你无论如何都会使网络饱和；当他们完成一个任务时，工作线程只是抓住另一个作业并开始执行它。我将在稍后添加一些示例代码。@BarathanR:code-example-added。我不能保证它会直接运行，因为你的代码本身是不可运行的，但它应该非常接近。顺便说一句，为什么你需要那么多线程？线程具有内存和CPU上下文切换开销，这将降低性能。例如，Windows为每个线程分配1MB的堆栈。如果你请求一堆URL，你可能想考虑做异步IO。看看

import multiprocessing.dummy as mp
from contextlib import closing

# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
     open(file_output, 'w') as outf,\
     closing(mp.Pool(32)) as pool:
    # Define generator that creates tuples of arguments to pass to makeURLcall
    # We also read the file in lazily instead of using readlines, to
    # start producing results faster
    tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
    # Pulls and writes results from the workers as they become available
    outf.writelines(pool.imap_unordered(makeURLcall, tasks))

# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up