在python中运行并行请求会话
我试图打开多个web会话并将数据保存到CSV中,我已经使用for loop&requests.get options编写了代码,但是访问90个web位置需要很长时间。有人能告诉我loc_var的整个流程是如何并行运行的吗: 代码运行良好,只是loc_var的问题一个接一个地运行,并且花费了很长时间 要并行访问所有for循环loc_var URL并写入CSV操作吗 代码如下:在python中运行并行请求会话,python,multithreading,pandas,asynchronous,python-requests,Python,Multithreading,Pandas,Asynchronous,Python Requests,我试图打开多个web会话并将数据保存到CSV中,我已经使用for loop&requests.get options编写了代码,但是访问90个web位置需要很长时间。有人能告诉我loc_var的整个流程是如何并行运行的吗: 代码运行良好,只是loc_var的问题一个接一个地运行,并且花费了很长时间 要并行访问所有for循环loc_var URL并写入CSV操作吗 代码如下: import pandas as pd import numpy as np import os import reque
import pandas as pd
import numpy as np
import os
import requests
import datetime
import zipfile
t=datetime.date.today()-datetime.timedelta(2)
server = [("A","web1",":5000","username=usr&password=p7Tdfr")]
'''List of all web_ips'''
web_1 = ["Web1","Web2","Web3","Web4","Web5","Web6","Web7","Web8","Web9","Web10","Web11","Web12","Web13","Web14","Web15"]
'''List of All location'''
loc_var =["post1","post2","post3","post4","post5","post6","post7","post8","post9","post10","post11","post12","post13","post14","post15","post16","post17","post18"]
for s,web,port,usr in server:
login_url='http://'+web+port+'/api/v1/system/login/?'+usr
print (login_url)
s= requests.session()
login_response = s.post(login_url)
print("login Responce",login_response)
#Start access the Web for Loc_variable
for mkt in loc_var:
#output is CSV File
com_actions_url='http://'+web+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
print("com_action_url",com_actions_url)
r = s.get(com_actions_url)
print("action",r)
if r.ok == True:
with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
f.write(r.content)
# If loc is not aceesble try with another Web_1 List
if r.ok == False:
while r.ok == False:
for web_2 in web_1:
login_url='http://'+web_2+port+'/api/v1/system/login/?'+usr
com_actions_url='http://'+web_2+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
login_response = s.post(login_url)
print("login Responce",login_response)
print("com_action_url",com_actions_url)
r = s.get(com_actions_url)
if r.ok == True:
with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
f.write(r.content)
break
您可以采取多种方法来进行并发HTTP请求。我使用的两种方法是(1)使用多个线程,或者(2)使用异步发送请求 要使用线程池并行发送请求,首先生成要并行获取的URL列表(在本例中,生成
login\u URL
和com\u action\u URL
),然后并发请求所有URL,如下所示:
from concurrent.futures import ThreadPoolExecutor
import requests
def fetch(url):
page = requests.get(url)
return page.text
# Catch HTTP errors/exceptions here
pool = ThreadPoolExecutor(max_workers=5)
urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com'] # Create a list of urls
for page in pool.map(fetch, urls):
# Do whatever you want with the results ...
print(page[0:100])
使用asyncio/aiohttp通常比上面的线程方法更快,但学习曲线更复杂。下面是一个简单的示例(Python 3.7+):
但是,除非您要发出大量请求,否则线程方法可能就足够了(而且更容易实现)。我想您粘贴了
jupyter
笔记本,对吗?
import asyncio
import aiohttp
urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.bing.com']
async def fetch(session, url):
async with session.get(url) as resp:
return await resp.text()
# Catch HTTP errors/exceptions here
async def fetch_concurrent(urls):
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession() as session:
tasks = []
for u in urls:
tasks.append(loop.create_task(fetch(session, u)))
for result in asyncio.as_completed(tasks):
page = await result
#Do whatever you want with results
print(page[0:100])
asyncio.run(fetch_concurrent(urls))