Python 3.x 如何提高此python请求会话的速度?
我正在使用Anaconda-Python 3.5.2 我有一个280000网址的列表。 我正在抓取数据并试图跟踪数据的url 我已经提出了大约3万个请求。我平均每秒1个请求Python 3.x 如何提高此python请求会话的速度?,python-3.x,pandas,python-requests,Python 3.x,Pandas,Python Requests,我正在使用Anaconda-Python 3.5.2 我有一个280000网址的列表。 我正在抓取数据并试图跟踪数据的url 我已经提出了大约3万个请求。我平均每秒1个请求 response_df = pd.DataFrame() # create the session with requests.Session() as s: # loop through the list of urls for url in url_list: # call the res
response_df = pd.DataFrame()
# create the session
with requests.Session() as s:
# loop through the list of urls
for url in url_list:
# call the resource
resp = s.get(url)
# check the response
if resp.status_code == requests.codes.ok:
# create a new dataframe with the response
ftest = json_normalize(resp.json())
ftest['url'] = url
response_df = response_df.append(ftest, ignore_index=True)
else:
print("Something went wrong! Hide your wife! Hide the kids!")
response_df.to_csv(results_csv)
我最终放弃了请求,而是使用了异步和aiohttp。我的请求速度大约是每秒1次。新方法平均每秒5次左右,仅使用我系统资源的20%左右。我最终使用了与此非常相似的东西: 此外,这也很有帮助:
请将代码缩进,也可以考虑对代码进行解析。此外,考虑预先分配输出DF。URL是静态的还是其值取决于诸如最后一个请求之类的东西?谢谢,我将阅读有关代码的预配置和预分配输出DF。关于url,基本url是静态的,但每个url都是唯一的。
import aiohttp
import asyncio
import async_timeout
import os
async def download_coroutine(session, url):
with async_timeout.timeout(10):
async with session.get(url) as response:
filename = os.path.basename(url)
with open(filename, 'wb') as f_handle:
while True:
chunk = await response.content.read(1024)
if not chunk:
break
f_handle.write(chunk)
return await response.release()
async def main(loop):
urls = ["http://www.irs.gov/pub/irs-pdf/f1040.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040a.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040ez.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040es.pdf",
"http://www.irs.gov/pub/irs-pdf/f1040sb.pdf"]
async with aiohttp.ClientSession(loop=loop) as session:
for url in urls:
await download_coroutine(session, url)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))