如何在Python中使用线程？_Python_Multithreading_Concurrency_Python Multithreading

如何在Python中使用线程？

python multithreading concurrency

如何在Python中使用线程？,python,multithreading,concurrency,python-multithreading,Python,Multithreading,Concurrency,Python Multithreading,我试图理解Python中的线程。我已经看过了文档和示例，但坦率地说，许多示例过于复杂，我很难理解它们如何清楚地显示为多线程划分的任务？注意：对于Python中的实际并行化，应该使用模块来分叉并行执行的多个进程（由于全局解释器锁，Python线程提供了交错，但实际上它们是串行执行的，而不是并行执行的，并且只有在交错I/O操作时才有用）但是，如果你只是在寻找交错（或者正在执行I/O操作，尽管全局解释器锁是可并行的），那么模块就是开始的地方。作为一个简单的例子，我们考虑通过将子串求和： impor

我试图理解Python中的线程。我已经看过了文档和示例，但坦率地说，许多示例过于复杂，我很难理解它们

如何清楚地显示为多线程划分的任务？

注意：对于Python中的实际并行化，应该使用模块来分叉并行执行的多个进程（由于全局解释器锁，Python线程提供了交错，但实际上它们是串行执行的，而不是并行执行的，并且只有在交错I/O操作时才有用）

但是，如果你只是在寻找交错（或者正在执行I/O操作，尽管全局解释器锁是可并行的），那么模块就是开始的地方。作为一个简单的例子，我们考虑通过将子串求和：

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

请注意，上面这是一个非常愚蠢的示例，因为它绝对不进行I/O操作，并且将串行执行，尽管由于全局解释器锁定（增加了上下文切换的开销）。

这里有一个简单的示例：您需要尝试几个备选URL，并返回第一个URL的内容进行响应

import Queue
import threading
import urllib2

# Called by each thread
def get_url(q, url):
    q.put(urllib2.urlopen(url).read())

theurls = ["http://google.com", "http://yahoo.com"]

q = Queue.Queue()

for u in theurls:
    t = threading.Thread(target=get_url, args = (q,u))
    t.daemon = True
    t.start()

s = q.get()
print s

在这种情况下，线程被用作一种简单的优化：每个子线程都在等待URL解析和响应，以将其内容放入队列；每个线程都是一个守护进程（如果主线程结束，则不会保持进程正常运行，这一点更为常见）；主线程启动所有子线程，在队列上执行

get

，等待其中一个执行了

put

，然后发出结果并终止（这将删除可能仍在运行的所有子线程，因为它们是守护进程线程）

Python中线程的正确使用总是与I/O操作相关联（因为CPython无论如何都不使用多个内核来运行CPU绑定的任务，所以线程的唯一原因是在等待某些I/O时不会阻塞进程）。顺便说一句，队列几乎总是将工作分配给线程和/或收集工作结果的最佳方式，而且它们本质上是线程安全的，因此它们可以避免您担心锁、条件、事件、信号量和其他线程间协调/通信概念。

与前面提到的其他概念一样，CPython只能使用线程y表示由于的I/O等待

如果您希望从多个内核中受益，以执行CPU限制的任务，请使用：

对我来说，线程的最佳示例是监视异步事件

# thread_test.py
import threading
import time

class Monitor(threading.Thread):
    def __init__(self, mon):
        threading.Thread.__init__(self)
        self.mon = mon

    def run(self):
        while True:
            if self.mon[0] == 2:
                print "Mon = 2"
                self.mon[0] = 3;

您可以通过打开会话并执行以下操作来使用此代码：

>>> from thread_test import Monitor
>>> a = [0]
>>> mon = Monitor(a)
>>> mon.start()
>>> a[0] = 2
Mon = 2
>>>a[0] = 2
Mon = 2

等几分钟

>>> a[0] = 2
Mon = 2

请注意：线程不需要队列

这是我能想到的最简单的例子，显示了10个进程同时运行

import threading
from random import randint
from time import sleep


def print_number(number):

    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):

    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

但是，这里有一个我认为更有用的修改版本（至少对我来说）

更新：适用于Python2和Python3

try:
    # For Python 3
    import queue
    from urllib.request import urlopen
except:
    # For Python 2 
    import Queue as queue
    from urllib2 import urlopen

import threading

worker_data = ['http://google.com', 'http://yahoo.com', 'http://bing.com']

# Load up a queue with your data. This will handle locking
q = queue.Queue()
for url in worker_data:
    q.put(url)

# Define a worker function
def worker(url_queue):
    queue_full = True
    while queue_full:
        try:
            # Get your data off the queue, and do some work
            url = url_queue.get(False)
            data = urlopen(url).read()
            print(len(data))

        except queue.Empty:
            queue_full = False

# Create as many threads as you want
thread_count = 5
for i in range(thread_count):
    t = threading.Thread(target=worker, args = (q,))
    t.start()

我发现这非常有用：创建与内核一样多的线程，并让它们执行（大量）任务（在本例中，调用shell程序）：

自从2010年提出这个问题以来，如何使用和使用Python进行简单的多线程处理变得非常简单

下面的代码来自一篇文章/博客文章，您一定要查看它（无从属关系）。我将在下面进行总结-它最终只是几行代码：

from multiprocessing.dummy import Pool as ThreadPool
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

这是的多线程版本：

results = []
for item in my_array:
    results.append(my_function(item))

说明

Map是一个很酷的小函数，是将并行性轻松注入Python代码的关键。对于那些不熟悉的人来说，Map是从Lisp等函数语言中提取出来的。它是一个将另一个函数映射到序列的函数

Map为我们处理序列上的迭代，应用函数，并将所有结果存储在末尾一个方便的列表中

实施

map函数的并行版本由两个库提供：multiprocessing，以及它鲜为人知但同样奇妙的单步子函数multiprocessing.dummy

multiprocessing.dummy

与多处理模块完全相同（-对CPU密集型任务使用多个进程；对I/O（和期间）使用线程）：

dummy复制了多处理的API，但只不过是线程模块的包装器

以及时间安排的结果：

Single thread:   14.4 seconds
       4 Pool:   3.1 seconds
       8 Pool:   1.4 seconds
      13 Pool:   1.3 seconds

传递多个参数（工作原理如下）：
要传递多个数组，请执行以下操作：

results = pool.starmap(function, zip(list_a, list_b))
或传递常数和数组：

results = pool.starmap(function, zip(itertools.repeat(constant), list_a))
如果您使用的是早期版本的Python，则可以通过（0）传递多个参数
（感谢您的有益评论。）
使用全新模块
执行器方法可能对所有以前接触过Java的人来说都很熟悉

还有一点需要注意：为了保持宇宙的正常运行，如果你没有将
与上下文一起使用（这是非常棒的，它可以为你做）给定一个函数，f ，请像这样执行它： import threading threading.Thread(target=f).start() 将参数传递给f threading.Thread(target=f, args=(a,b,c)).start() 下面是一个简单的多线程示例，这将很有帮助。您可以运行它并轻松理解多线程在Python中的工作方式。我使用了一个锁来阻止访问其他线程，直到前面的线程完成它们的工作。通过使用这行代码 tLock=threading.BoundedSemaphore（值=4）您可以一次允许多个进程，并保留将在以后或在完成以前的进程后运行的其余线程 import threading import time #tLock = threading.Lock() tLock = threading.BoundedSemaphore(value=4) def timer(name, delay, repeat): print "\r\nTimer: ", name, " Started" tLock.acquire() print "\r\n", name, " has the acquired the lock" while repeat > 0: time.sleep(delay) print "\r\n", name, ": ", str(time.ctime(time.time())) repeat -= 1 print "\r\n", name, " is releaseing the lock" tLock.release() print "\r\nTimer: ", name, " Completed" def Main(): t1 = threading.Thread(target=timer, args=("Timer1", 2, 5)) t2 = threading.Thread(target=timer, args=("Timer2", 3, 5)) t3 = threading.Thread(target=timer, args=("Timer3", 4, 5)) t4 = threading.Thread(target=timer, args=("Timer4", 5, 5)) t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5)) t1.start() t2.start() t3.start() t4.start() t5.start() print "\r\nMain Complete" if __name__ == "__main__": Main() 以前的解决方案都没有在我的GNU/Linux服务器上使用多个内核（我没有管理员权限）。他们只是在一个核心上运行我 def sqr(val): import time time.sleep(0.1) return val * val def process_result(result): print(result) def process_these_asap(tasks): import concurrent.futures with concurrent.futures.ProcessPoolExecutor() as executor: futures = [] for task in tasks: futures.append(executor.submit(sqr, task)) for future in concurrent.futures.as_completed(futures): process_result(future.result()) # Or instead of all this just do: # results = executor.map(sqr, tasks) # list(map(process_result, results)) def main(): tasks = list(range(10)) print('Processing {} tasks'.format(len(tasks))) process_these_asap(tasks) print('Done') return 0 if __name__ == '__main__': import sys sys.exit(main()) import threading threading.Thread(target=f).start() threading.Thread(target=f, args=(a,b,c)).start() import threading import time #tLock = threading.Lock() tLock = threading.BoundedSemaphore(value=4) def timer(name, delay, repeat): print "\r\nTimer: ", name, " Started" tLock.acquire() print "\r\n", name, " has the acquired the lock" while repeat > 0: time.sleep(delay) print "\r\n", name, ": ", str(time.ctime(time.time())) repeat -= 1 print "\r\n", name, " is releaseing the lock" tLock.release() print "\r\nTimer: ", name, " Completed" def Main(): t1 = threading.Thread(target=timer, args=("Timer1", 2, 5)) t2 = threading.Thread(target=timer, args=("Timer2", 3, 5)) t3 = threading.Thread(target=timer, args=("Timer3", 4, 5)) t4 = threading.Thread(target=timer, args=("Timer4", 5, 5)) t5 = threading.Thread(target=timer, args=("Timer5", 0.1, 5)) t1.start() t2.start() t3.start() t4.start() t5.start() print "\r\nMain Complete" if __name__ == "__main__": Main() from os import fork values = ['different', 'values', 'for', 'threads'] for i in range(len(values)): p = fork() if p == 0: my_function(values[i]) break import concurrent.futures import urllib.request URLS = ['http://www.foxnews.com/', 'http://www.cnn.com/', 'http://europe.wsj.com/', 'http://www.bbc.co.uk/', 'http://some-made-up-domain.com/'] # Retrieve a single page and report the URL and contents def load_url(url, timeout): with urllib.request.urlopen(url, timeout=timeout) as conn: return conn.read() # We can use a with statement to ensure threads are cleaned up promptly with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: print('%r generated an exception: %s' % (url, exc)) else: print('%r page is %d bytes' % (url, len(data))) import concurrent.futures import math PRIMES = [ 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419] def is_prime(n): if n % 2 == 0: return False sqrt_n = int(math.floor(math.sqrt(n))) for i in range(3, sqrt_n + 1, 2): if n % i == 0: return False return True def main(): with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, prime)) if __name__ == '__main__': main() from concurrent.futures import ThreadPoolExecutor, as_completed def get_url(url): # Your actual program here. Using threading.Lock() if necessary return "" # List of URLs to fetch urls = ["url1", "url2"] with ThreadPoolExecutor(max_workers = 5) as executor: # Create threads futures = {executor.submit(get_url, url) for url in urls} # as_completed() gives you the threads once finished for f in as_completed(futures): # Get the results rs = f.result() import math import timeit import threading import multiprocessing from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor def time_stuff(fn): """ Measure time of execution of a function """ def wrapper(*args, **kwargs): t0 = timeit.default_timer() fn(*args, **kwargs) t1 = timeit.default_timer() print("{} seconds".format(t1 - t0)) return wrapper def find_primes_in(nmin, nmax): """ Compute a list of prime numbers between the given minimum and maximum arguments """ primes = [] # Loop from minimum to maximum for current in range(nmin, nmax + 1): # Take the square root of the current number sqrt_n = int(math.sqrt(current)) found = False # Check if the any number from 2 to the square root + 1 divides the current numnber under consideration for number in range(2, sqrt_n + 1): # If divisible we have found a factor, hence this is not a prime number, lets move to the next one if current % number == 0: found = True break # If not divisible, add this number to the list of primes that we have found so far if not found: primes.append(current) # I am merely printing the length of the array containing all the primes, but feel free to do what you want print(len(primes)) @time_stuff def sequential_prime_finder(nmin, nmax): """ Use the main process and main thread to compute everything in this case """ find_primes_in(nmin, nmax) @time_stuff def threading_prime_finder(nmin, nmax): """ If the minimum is 1000 and the maximum is 2000 and we have four workers, 1000 - 1250 to worker 1 1250 - 1500 to worker 2 1500 - 1750 to worker 3 1750 - 2000 to worker 4 so let’s split the minimum and maximum values according to the number of workers """ nrange = nmax - nmin threads = [] for i in range(8): start = int(nmin + i * nrange/8) end = int(nmin + (i + 1) * nrange/8) # Start the thread with the minimum and maximum split up to compute # Parallel computation will not work here due to the GIL since this is a CPU-bound task t = threading.Thread(target = find_primes_in, args = (start, end)) threads.append(t) t.start() # Don’t forget to wait for the threads to finish for t in threads: t.join() @time_stuff def processing_prime_finder(nmin, nmax): """ Split the minimum, maximum interval similar to the threading method above, but use processes this time """ nrange = nmax - nmin processes = [] for i in range(8): start = int(nmin + i * nrange/8) end = int(nmin + (i + 1) * nrange/8) p = multiprocessing.Process(target = find_primes_in, args = (start, end)) processes.append(p) p.start() for p in processes: p.join() @time_stuff def thread_executor_prime_finder(nmin, nmax): """ Split the min max interval similar to the threading method, but use a thread pool executor this time. This method is slightly faster than using pure threading as the pools manage threads more efficiently. This method is still slow due to the GIL limitations since we are doing a CPU-bound task. """ nrange = nmax - nmin with ThreadPoolExecutor(max_workers = 8) as e: for i in range(8): start = int(nmin + i * nrange/8) end = int(nmin + (i + 1) * nrange/8) e.submit(find_primes_in, start, end) @time_stuff def process_executor_prime_finder(nmin, nmax): """ Split the min max interval similar to the threading method, but use the process pool executor. This is the fastest method recorded so far as it manages process efficiently + overcomes GIL limitations. RECOMMENDED METHOD FOR CPU-BOUND TASKS """ nrange = nmax - nmin with ProcessPoolExecutor(max_workers = 8) as e: for i in range(8): start = int(nmin + i * nrange/8) end = int(nmin + (i + 1) * nrange/8) e.submit(find_primes_in, start, end) def main(): nmin = int(1e7) nmax = int(1.05e7) print("Sequential Prime Finder Starting") sequential_prime_finder(nmin, nmax) print("Threading Prime Finder Starting") threading_prime_finder(nmin, nmax) print("Processing Prime Finder Starting") processing_prime_finder(nmin, nmax) print("Thread Executor Prime Finder Starting") thread_executor_prime_finder(nmin, nmax) print("Process Executor Finder Starting") process_executor_prime_finder(nmin, nmax) main() Sequential Prime Finder Starting 9.708213827005238 seconds Threading Prime Finder Starting 9.81836523200036 seconds Processing Prime Finder Starting 3.2467174359990167 seconds Thread Executor Prime Finder Starting 10.228896902000997 seconds Process Executor Finder Starting 2.656402041000547 seconds from threading import Thread from project import app import csv def import_handler(csv_file_name): thr = Thread(target=dump_async_csv_data, args=[csv_file_name]) thr.start() def dump_async_csv_data(csv_file_name): with app.app_context(): with open(csv_file_name) as File: reader = csv.DictReader(File) for row in reader: # DB operation/query import_handler(csv_file_name) import threading import requests def send(): r = requests.get('https://www.stackoverlow.com') thread = [] t = threading.Thread(target=send()) thread.append(t) t.start() from concurrent.futures import ThreadPoolExecutor, as_completed from time import sleep, time def concurrent(max_worker): futures = [] tic = time() with ThreadPoolExecutor(max_workers=max_worker) as executor: futures.append(executor.submit(sleep, 2)) # Two seconds sleep futures.append(executor.submit(sleep, 1)) futures.append(executor.submit(sleep, 7)) futures.append(executor.submit(sleep, 3)) for future in as_completed(futures): if future.result() is not None: print(future.result()) print(f'Total elapsed time by {max_worker} workers:', time()-tic) concurrent(5) concurrent(4) concurrent(3) concurrent(2) concurrent(1) Total elapsed time by 5 workers: 7.007831811904907 Total elapsed time by 4 workers: 7.007944107055664 Total elapsed time by 3 workers: 7.003149509429932 Total elapsed time by 2 workers: 8.004627466201782 Total elapsed time by 1 workers: 13.013478994369507 #!/bin/python from multiprocessing.dummy import Pool from subprocess import PIPE,Popen import time import os # In the variable pool_size we define the "parallelness". # For CPU-bound tasks, it doesn't make sense to create more Pool processes # than you have cores to run them on. # # On the other hand, if you are using I/O-bound tasks, it may make sense # to create a quite a few more Pool processes than cores, since the processes # will probably spend most their time blocked (waiting for I/O to complete). pool_size = 8 def do_ping(ip): if os.name == 'nt': print ("Using Windows Ping to " + ip) proc = Popen(['ping', ip], stdout=PIPE) return proc.communicate()[0] else: print ("Using Linux / Unix Ping to " + ip) proc = Popen(['ping', ip, '-c', '4'], stdout=PIPE) return proc.communicate()[0] os.system('cls' if os.name=='nt' else 'clear') print ("Running using threads\n") start_time = time.time() pool = Pool(pool_size) website_names = ["www.google.com","www.facebook.com","www.pinterest.com","www.microsoft.com"] result = {} for website_name in website_names: result[website_name] = pool.apply_async(do_ping, args=(website_name,)) pool.close() pool.join() print ("\n--- Execution took {} seconds ---".format((time.time() - start_time))) # Now we do the same without threading, just to compare time print ("\nRunning NOT using threads\n") start_time = time.time() for website_name in website_names: do_ping(website_name) print ("\n--- Execution took {} seconds ---".format((time.time() - start_time))) # Here's one way to print the final output from the threads output = {} for key, value in result.items(): output[key] = value.get() print ("\nOutput aggregated in a Dictionary:") print (output) print ("\n") print ("\nPretty printed output: ") for key, value in output.items(): print (key + "\n") print (value)