Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/multithreading/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Multithreading Python使用多线程触发动态URL_Multithreading_Python 3.x_Python Multithreading - Fatal编程技术网

Multithreading Python使用多线程触发动态URL

Multithreading Python使用多线程触发动态URL,multithreading,python-3.x,python-multithreading,Multithreading,Python 3.x,Python Multithreading,我是Python线程的新手,我读过多篇文章,但我真的不知道如何使用它。然而,我试图完成我的任务,我想检查我是否用正确的方法来完成 任务是: 读取包含约20K条记录的大型CSV,从每条记录中获取id,并为CSV的每条记录调用HTTP API t1 = time.time() file_data_obj = csv.DictReader(open(file_path, 'rU')) threads = [] for record in file_data_obj: apiurl = h

我是Python线程的新手,我读过多篇文章,但我真的不知道如何使用它。然而,我试图完成我的任务,我想检查我是否用正确的方法来完成

任务是: 读取包含约20K条记录的大型CSV,从每条记录中获取id,并为CSV的每条记录调用HTTP API

t1 = time.time()
file_data_obj = csv.DictReader(open(file_path, 'rU')) 
threads = []
for record in file_data_obj:
      apiurl = https://www.api-server.com?id=record.get("acc_id", "")
      thread = threading.Thread(target=requests.get, args=(apiurl,))
      thread.start()
      threads.append(thread)

t2 = time.time()

for thread in threads:
    thread.join()

print("Total time required to process a file - {} Secs".format(t2-t1))
  • 因为有20K条记录,它会启动20K个线程吗?或者
    OS
    /
    Python
    将处理它?如果是,我们可以限制它吗
  • 如何收集
    请求返回的响应。获取
  • t2-t1真的会给mw处理整个文件所需的时间吗
因为有20K条记录,它会启动20K个线程吗?或者OS/Python将处理它?如果是,我们可以限制它吗

是-它将为每个迭代启动一个线程。线程的最大数量取决于您的
OS

如何获取requests.get返回的响应

如果只想使用
线程
模块,则必须使用
队列
<代码>线程
按设计返回
,因此您必须在
线程
循环之间实现一条通信线路

from queue import Queue
from threading import Thread
import time

# A thread that produces data
q = Queue()



def return_get(q, apiurl):
    q.put(requests.get(apiurl)

for record in file_data_obj:
    apiurl = https://www.api-server.com?id=record.get("acc_id", "")
    t = threading.Thread(target=return_get, args=(q, apiurl))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

while not q.empty:
    r = q.get()  # Fetches the first item on the queue
    print(r.text)
另一种方法是使用工作池

from concurrent.futures import ThreadPoolExecutor
from queue import Queue
import urllib.request

threads = []

pool = ThreadPoolExecutor(10)

# Submit work to the pool
for record in file_data_obj:
    apiurl = https://www.api-server.com?id=record.get("acc_id", "")
    t = pool.submit(fetch_url, 'http://www.python.org')
    threads.append(t)

for t in threads:
    print(t.result())
因为有20K条记录,它会启动20K个线程吗?或者OS/Python将处理它?如果是,我们可以限制它吗

是-它将为每个迭代启动一个线程。线程的最大数量取决于您的
OS

如何获取requests.get返回的响应

如果只想使用
线程
模块,则必须使用
队列
<代码>线程
按设计返回
,因此您必须在
线程
循环之间实现一条通信线路

from queue import Queue
from threading import Thread
import time

# A thread that produces data
q = Queue()



def return_get(q, apiurl):
    q.put(requests.get(apiurl)

for record in file_data_obj:
    apiurl = https://www.api-server.com?id=record.get("acc_id", "")
    t = threading.Thread(target=return_get, args=(q, apiurl))
    t.start()
    threads.append(t)

for thread in threads:
    thread.join()

while not q.empty:
    r = q.get()  # Fetches the first item on the queue
    print(r.text)
另一种方法是使用工作池

from concurrent.futures import ThreadPoolExecutor
from queue import Queue
import urllib.request

threads = []

pool = ThreadPoolExecutor(10)

# Submit work to the pool
for record in file_data_obj:
    apiurl = https://www.api-server.com?id=record.get("acc_id", "")
    t = pool.submit(fetch_url, 'http://www.python.org')
    threads.append(t)

for t in threads:
    print(t.result())
你可以用

检索单个页面并报告URL和内容

def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()
使用N个工作线程创建池执行器

with concurrent.futures.ThreadPoolExecutor(max_workers=N_workers) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
你可以用

检索单个页面并报告URL和内容

def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()
使用N个工作线程创建池执行器

with concurrent.futures.ThreadPoolExecutor(max_workers=N_workers) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))