Python 如何在get请求中批处理URL列表？_Python_Python 3.x_Concurrency_Python Multithreading

Python 如何在get请求中批处理URL列表？

python python-3.x concurrency

Python 如何在get请求中批处理URL列表？,python,python-3.x,concurrency,python-multithreading,Python,Python 3.x,Concurrency,Python Multithreading,我有一个需要传递到API的ID列表成功地，我将ID做成了一个url字符串，并且我有一个~300k url的列表（~300k ID）我想得到每个api调用的文本部分，并在列表中我可以通过获取每个ID并使用类似的for循环将其传递到URL中，而无需遍历列表： L = [1,2,3] for i in L: #print (row) url = 'url&Id={}'.format(i) xml_data1 = requests.

我有一个需要传递到API的ID列表

成功地，我将ID做成了一个url字符串，并且我有一个~300k url的列表（~300k ID）

我想得到每个api调用的文本部分，并在列表中

我可以通过获取每个ID并使用类似的for循环将其传递到URL中，而无需遍历列表：

L = [1,2,3]

    for i in L:
        #print (row)
        url = 'url&Id={}'.format(i)
        xml_data1 = requests.get(url).text
        lst.append(xml_data1)
        time.sleep(1)
        print(xml_data1)

我一直试图使用

concurrent.futures

和

urllib.request

和库一次发送多个请求，但我不断收到错误：

username=xxxx&password=xxxx&Id=1' generated an exception: 'HTTPResponse' object has no attribute 'readall'

使用此代码：

lst = [url.com,url2.com]

URLS = lst

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result() 
            # do json processing here
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

如何调整我拥有的for循环或上面的代码以同时进行多个API调用

我这样问是因为我的连接不断被for循环重置，我不知道如何在ID或url方面继续我离开的地方

使用python3.6

编辑：

我从这里应用了代码

其中lst是URL列表

class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        results = grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)
        print (results)

test = Test()
test.async()

代码似乎正在工作，没有给出错误消息，但是如何从代码中将response.text附加到列表中

此处建议的grequests：

它不会直接修改您已有的代码，您可能需要使用不同的库重新编写代码，但是它听起来更适合您的需要

除了我们的承诺。请参阅下面的代码，其中说明了要更改的内容

import grequests
lst = ['https://www.google.com', 'https://www.google.cz']
class Test:
    def __init__(self):
        self.urls = lst

    def exception(self, request, exception):
        print ("Problem: {}: {}".format(request.url, exception))

    def async(self):
        return grequests.map((grequests.get(u) for u in self.urls), exception_handler=self.exception, size=5)


    def collate_responses(self, results):
        return [x.text for x in results]
test = Test()
#here we collect the results returned by the async function
results = test.async()
response_text = test.collate_responses(results)

感谢您的回复，我以前应用过该代码。请查看我的编辑。如果前面没有IDE，我不确定。你能给我看一下打印（结果）给你看的是什么吗？这就是它不打印任何东西的地方。据我所知，我的编辑中的代码首先获取所有请求，并包含大约300K URL，这需要一些时间。让我把我的列表缩短一圈。是的，我相信这是将响应映射到一个元组之类的东西，尝试一个只有3或4个不同URL的测试。我相信它没有打印任何内容，因为它可能仍然会生成大量的响应。也许可以将理解拆分为单独的元素，然后在每次收到响应时打印request.text。只需使用5个URL对其进行测试，得到一个带有