Python中最快的并行请求

Python中最快的并行请求,python,concurrency,request,python-requests,Python,Concurrency,Request,Python Requests,我需要在不同的服务器上不断向大约150个API发出许多请求。 我和交易一起工作,时间至关重要,我不能浪费1毫秒 我发现的解决方案和问题如下: 使用Asyncio异步:我不想依赖单个线程,因为某些原因它可能会被阻塞 线程:在Python上使用线程真的可靠吗?我是否有1个线程生成的风险 其他人被绊倒了 多进程:如果一个进程控制其他进程,我会吗 进程间通信中的时间是否过长 也许是一个使用所有这些的解决方案 如果Python中没有真正好的解决方案,我应该使用什么来代替呢 # Using Asynci

我需要在不同的服务器上不断向大约150个API发出许多请求。 我和交易一起工作,时间至关重要,我不能浪费1毫秒

我发现的解决方案和问题如下:

  • 使用Asyncio异步:我不想依赖单个线程,因为某些原因它可能会被阻塞
  • 线程:在Python上使用线程真的可靠吗?我是否有1个线程生成的风险
    其他人被绊倒了
  • 多进程:如果一个进程控制其他进程,我会吗 进程间通信中的时间是否过长
也许是一个使用所有这些的解决方案

如果Python中没有真正好的解决方案,我应该使用什么来代替呢

# Using Asyncio
import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = await future1
    response2 = await future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())


# Using Threads
from threading import Thread

def do_api(url):
    #...
    #...

#...
#...
for i in range(50):
    t = Thread(target=do_apis, args=(url_api[i],))
    t.start()

您不应该使用多线程或
asyncio.executor
,而应该使用,这相当于
请求
,但具有异步支持

导入异步IO
进口aiohttp
导入时间
网站=”“”https://www.youtube.com
https://www.facebook.com
https://www.baidu.com
https://www.yahoo.com
https://www.amazon.com
https://www.wikipedia.org
http://www.qq.com
https://www.google.co.in
https://www.twitter.com
https://www.live.com
http://www.taobao.com
https://www.bing.com
https://www.instagram.com
http://www.weibo.com
http://www.sina.com.cn
https://www.linkedin.com
http://www.yahoo.co.jp
http://www.msn.com
http://www.uol.com.br
https://www.google.de
http://www.yandex.ru
http://www.hao123.com
https://www.google.co.uk
https://www.reddit.com
https://www.ebay.com
https://www.google.fr
https://www.t.co
http://www.tmall.com
http://www.google.com.br
https://www.360.cn
http://www.sohu.com
https://www.amazon.co.jp
http://www.pinterest.com
https://www.netflix.com
http://www.google.it
https://www.google.ru
https://www.microsoft.com
http://www.google.es
https://www.wordpress.com
http://www.gmw.cn
https://www.tumblr.com
http://www.paypal.com
http://www.blogspot.com
http://www.imgur.com
https://www.stackoverflow.com
https://www.aliexpress.com
https://www.naver.com
http://www.ok.ru
https://www.apple.com
http://www.github.com
http://www.chinadaily.com.cn
http://www.imdb.com
https://www.google.co.kr
http://www.fc2.com
http://www.jd.com
http://www.blogger.com
http://www.163.com
http://www.google.ca
https://www.whatsapp.com
https://www.amazon.in
http://www.office.com
http://www.tianya.cn
http://www.google.co.id
http://www.youku.com
https://www.example.com
http://www.craigslist.org
https://www.amazon.de
http://www.nicovideo.jp
https://www.google.pl
http://www.soso.com
http://www.bilibili.com
http://www.dropbox.com
http://www.xinhuanet.com
http://www.outbrain.com
http://www.pixnet.net
http://www.alibaba.com
http://www.alipay.com
http://www.chrome.com
http://www.booking.com
http://www.googleusercontent.com
http://www.google.com.au
http://www.popads.net
http://www.cntv.cn
http://www.zhihu.com
https://www.amazon.co.uk
http://www.diply.com
http://www.coccoc.com
https://www.cnn.com
http://www.bbc.co.uk
https://www.twitch.tv
https://www.wikia.com
http://www.google.co.th
http://www.go.com
https://www.google.com.ph
http://www.doubleclick.net
http://www.onet.pl
http://www.googleadservices.com
http://www.accuweather.com
http://www.googleweblight.com
http://www.answers.yahoo.com"""
异步def get(url,会话):
尝试:
与session.get(url=url)异步作为响应:
resp=wait response.read()
打印(“成功获取url{},resp的长度为{}。”.format(url,len(resp)))
例外情况除外,如e:
打印(“由于{}而无法获取url{}。”.format(url,e.。。。。。。。。。。。。。。。。。。。。。。。。)
异步定义主(URL):
与aiohttp.ClientSession()作为会话异步:
ret=wait asyncio.gather(*[get(url,session)用于url中的url])
print(“Finalized all.Return是len{}输出的列表。”.format(len(ret)))
URL=websites.split(“\n”)
开始=时间。时间()
运行(主(URL))
end=time.time()
打印(“用了{}秒来拉{}个网站。”.format(end-start,len(URL)))
产出:

Successfully got url http://www.msn.com with resp of length 47967.
Successfully got url http://www.google.com.br with resp of length 14823.
Successfully got url https://www.t.co with resp of length 0.
Successfully got url http://www.google.es with resp of length 14798.
Successfully got url https://www.wikipedia.org with resp of length 66691.
Successfully got url http://www.google.it with resp of length 14805.
Successfully got url http://www.googleadservices.com with resp of length 1561.
Successfully got url http://www.cntv.cn with resp of length 3232.
Successfully got url https://www.example.com with resp of length 1256.
Successfully got url https://www.google.co.uk with resp of length 14184.
Successfully got url http://www.accuweather.com with resp of length 269.
Successfully got url http://www.google.ca with resp of length 14172.
Successfully got url https://www.facebook.com with resp of length 192898.
Successfully got url https://www.apple.com with resp of length 75422.
Successfully got url http://www.gmw.cn with resp of length 136136.
Successfully got url https://www.google.ru with resp of length 14803.
Successfully got url https://www.bing.com with resp of length 70314.
Successfully got url http://www.googleusercontent.com with resp of length 1561.
Successfully got url https://www.tumblr.com with resp of length 37500.
Successfully got url http://www.googleweblight.com with resp of length 1619.
Successfully got url https://www.google.co.in with resp of length 14230.
Successfully got url http://www.qq.com with resp of length 101957.
Successfully got url http://www.xinhuanet.com with resp of length 113239.
Successfully got url https://www.twitch.tv with resp of length 105014.
Successfully got url http://www.google.co.id with resp of length 14806.
Successfully got url https://www.linkedin.com with resp of length 90047.
Successfully got url https://www.google.fr with resp of length 14777.
Successfully got url https://www.google.co.kr with resp of length 14797.
Successfully got url http://www.google.co.th with resp of length 14783.
Successfully got url https://www.google.pl with resp of length 14769.
Successfully got url http://www.google.com.au with resp of length 14228.
Successfully got url https://www.whatsapp.com with resp of length 84551.
Successfully got url https://www.google.de with resp of length 14767.
Successfully got url https://www.google.com.ph with resp of length 14196.
Successfully got url https://www.cnn.com with resp of length 1135447.
Successfully got url https://www.wordpress.com with resp of length 216637.
Successfully got url https://www.twitter.com with resp of length 61869.
Successfully got url http://www.alibaba.com with resp of length 282210.
Successfully got url https://www.instagram.com with resp of length 20776.
Successfully got url https://www.live.com with resp of length 36621.
Successfully got url https://www.aliexpress.com with resp of length 37388.
Successfully got url http://www.uol.com.br with resp of length 463614.
Successfully got url https://www.microsoft.com with resp of length 230635.
Successfully got url http://www.pinterest.com with resp of length 87012.
Successfully got url http://www.paypal.com with resp of length 103763.
Successfully got url https://www.wikia.com with resp of length 237977.
Successfully got url http://www.sina.com.cn with resp of length 530525.
Successfully got url https://www.amazon.de with resp of length 341222.
Successfully got url https://www.stackoverflow.com with resp of length 190878.
Successfully got url https://www.ebay.com with resp of length 263256.
Successfully got url http://www.diply.com with resp of length 557848.
Successfully got url http://www.office.com with resp of length 111909.
Successfully got url http://www.imgur.com with resp of length 6223.
Successfully got url https://www.amazon.co.jp with resp of length 417751.
Successfully got url http://www.outbrain.com with resp of length 54481.
Successfully got url https://www.amazon.co.uk with resp of length 362057.
Successfully got url http://www.chrome.com with resp of length 223832.
Successfully got url http://www.popads.net with resp of length 14517.
Successfully got url https://www.youtube.com with resp of length 571028.
Successfully got url http://www.doubleclick.net with resp of length 130244.
Successfully got url https://www.yahoo.com with resp of length 510721.
Successfully got url http://www.tianya.cn with resp of length 7619.
Successfully got url https://www.netflix.com with resp of length 422277.
Successfully got url https://www.naver.com with resp of length 210175.
Successfully got url http://www.blogger.com with resp of length 94478.
Successfully got url http://www.soso.com with resp of length 5816.
Successfully got url http://www.github.com with resp of length 212285.
Successfully got url https://www.amazon.com with resp of length 442097.
Successfully got url http://www.go.com with resp of length 598355.
Successfully got url http://www.chinadaily.com.cn with resp of length 102857.
Successfully got url http://www.sohu.com with resp of length 216027.
Successfully got url https://www.amazon.in with resp of length 417175.
Successfully got url http://www.answers.yahoo.com with resp of length 104628.
Successfully got url http://www.jd.com with resp of length 18217.
Successfully got url http://www.blogspot.com with resp of length 94478.
Successfully got url http://www.fc2.com with resp of length 16997.
Successfully got url https://www.baidu.com with resp of length 301922.
Successfully got url http://www.craigslist.org with resp of length 59438.
Successfully got url http://www.imdb.com with resp of length 675494.
Successfully got url http://www.yahoo.co.jp with resp of length 37036.
Successfully got url http://www.onet.pl with resp of length 854384.
Successfully got url http://www.dropbox.com with resp of length 200591.
Successfully got url http://www.zhihu.com with resp of length 50543.
Successfully got url http://www.yandex.ru with resp of length 174347.
Successfully got url http://www.ok.ru with resp of length 206604.
Successfully got url http://www.163.com with resp of length 588036.
Successfully got url http://www.bbc.co.uk with resp of length 303267.
Successfully got url http://www.nicovideo.jp with resp of length 116124.
Successfully got url http://www.pixnet.net with resp of length 6448.
Successfully got url http://www.bilibili.com with resp of length 96941.
Successfully got url https://www.reddit.com with resp of length 718393.
Successfully got url http://www.booking.com with resp of length 472655.
Successfully got url https://www.360.cn with resp of length 79943.
Successfully got url http://www.taobao.com with resp of length 384755.
Successfully got url http://www.youku.com with resp of length 326873.
Successfully got url http://www.coccoc.com with resp of length 64687.
Successfully got url http://www.tmall.com with resp of length 137527.
Successfully got url http://www.hao123.com with resp of length 331222.
Successfully got url http://www.weibo.com with resp of length 93712.
Successfully got url http://www.alipay.com with resp of length 24057.
Finalized all. Return is a list of len 100 outputs.
Took 3.9256999492645264 seconds to pull 100 websites.
您可以看到,在我的互联网连接(佛罗里达州迈阿密)上,使用
aiohttp
,在大约4秒钟内成功访问了来自世界各地的100个网站(使用或不使用
https
)。请记住,以下情况可能会使程序速度降低几毫秒:

  • 打印
    语句(是,包括上面代码中的语句)
  • 到达距离您地理位置更远的服务器
上面的例子有上面的两个例子,因此它可以说是做你所要求的事情的最不优化的方式。然而,我相信这是一个伟大的开始,你正在寻找


编辑:2021年4月6日

请注意,在上述代码中,我们正在查询多个(不同)服务器,因此使用单个
ClientSession
可能会降低性能:

会话封装连接池(连接器实例),默认情况下支持keepalives。除非在应用程序的生命周期内连接到大量未知的不同服务器,否则建议您在应用程序的生命周期内使用单个会话以从连接池中获益。()

如果您的计划是查询默认为单个
ClientSession
的已知服务器的数量,那么可能是最好的。我将答案修改为使用单个
ClientSession
,因为我相信大多数使用此答案的人不会同时查询不同(未知)的服务器,但这一点值得记住,以防您执行OP最初要求的操作

问:Python中最快的并行请求 我不能浪费1毫秒

如果选择了错误的方法,人们可以很容易地将5倍多的时间花在做同样数量的工作上。检查下面的[Epilogue]部分,以查看一个示例代码(MCVE示例),其中任何>>> pass; anAsyncEventLOOP = asyncio.get_event_loop() >>> aClk.start(); anAsyncEventLOOP.run_until_complete( mainAsyncLoopPAYLOAD_wrapper( anAsyncEventLOOP, 25 ) );aClk.stop() Now finished urlGetCOROUTINE(launch_id:<11>) E2E execution took 246193 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<21>) E2E execution took 247013 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 2>) E2E execution took 237278 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<20>) E2E execution took 247111 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<23>) E2E execution took 252462 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<16>) E2E execution took 237591 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 1>) E2E execution took 243398 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 9>) E2E execution took 232643 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 6>) E2E execution took 247308 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<17>) E2E execution took 250773 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<24>) E2E execution took 245354 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<10>) E2E execution took 259812 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<13>) E2E execution took 241707 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 3>) E2E execution took 258745 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 4>) E2E execution took 243659 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<18>) E2E execution took 249252 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 8>) E2E execution took 245812 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<12>) E2E execution took 244684 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 5>) E2E execution took 257701 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<15>) E2E execution took 243001 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 7>) E2E execution took 256776 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<22>) E2E execution took 266979 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<14>) E2E execution took 252169 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:< 0>) E2E execution took 263190 [us](Safety anAsyncTIMEOUT was set 10 [s]) Now finished urlGetCOROUTINE(launch_id:<19>) E2E execution took 247591 [us](Safety anAsyncTIMEOUT was set 10 [s]) 273829
pass;    import aiohttp, asyncio, async_timeout
from zmq import Stopwatch

async def urlGetCOROUTINE( aSESSION, anURL2GET, aCoroID = -1, anAsyncTIMEOUT = 10 ):
    aLocalCLK = Stopwatch()
    res       = ""
    ############################################# SECTION-UNDER-TEST
    aLocalCLK.start() ##############################################
    with async_timeout.timeout( anAsyncTIMEOUT ):# RESPONSE ######## TIMEOUT-PROTECTED
         async  with aSESSION.get( anURL2GET ) as aRESPONSE:
            while True:
                    pass;  aGottenCHUNK = await   aRESPONSE.content.read( 1024 )
                    if not aGottenCHUNK:
                        break
                    res += str( aGottenCHUNK )
            await                                 aRESPONSE.release()
    ################################################################ TIMEOUT-PROTECTED
    aTestRunTIME_us = aLocalCLK.stop() ########## SECTION-UNDER-TEST

    print( "Now finished urlGetCOROUTINE(launch_id:<{2: >2d}>) E2E execution took {0: >9d} [us](Safety anAsyncTIMEOUT was set {1: >2d} [s])".format( aTestRunTIME_us, anAsyncTIMEOUT, aCoroID ) )
    return ( aTestRunTIME_us, len( res ) )

async def mainAsyncLoopPAYLOAD_wrapper( anAsyncLOOP_to_USE, aNumOfTESTs = 10, anUrl2GoGET = "http://example.com" ):
    '''
    aListOfURLs2GET = [ "https://www.irs.gov/pub/irs-pdf/f1040.pdf",
                        "https://www.forexfactory.com/news",
                         ...
                         ]
    '''
    async with aiohttp.ClientSession( loop = anAsyncLOOP_to_USE ) as aSESSION:
        aBlockOfAsyncCOROUTINEs_to_EXECUTE = [ urlGetCOROUTINE(      aSESSION, anUrl2GoGET, launchID ) for launchID in range( min( aNumOfTESTs, 1000 ) ) ]
        await asyncio.gather( *aBlockOfAsyncCOROUTINEs_to_EXECUTE )
                                                                                                                                                                              602283L _ _ _ _ _ _ _ _ _
>>> aClk.start(); len( str( Parallel( n_jobs = -1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   512459L [PAR]   QUAD-CORE .multiprocessing
>>> aClk.start(); len( str( Parallel( n_jobs = -1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   511655L
>>> aClk.start(); len( str( Parallel( n_jobs = -1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   506400L
>>> aClk.start(); len( str( Parallel( n_jobs = -1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   508031L
>>> aClk.start(); len( str( Parallel( n_jobs = -1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   514377L _ _ _ _ _ _ _ _ _

>>> aClk.start(); len( str( Parallel( n_jobs =  1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   123185L [PAR] SINGLE-CORE
>>> aClk.start(); len( str( Parallel( n_jobs =  1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   122631L
>>> aClk.start(); len( str( Parallel( n_jobs =  1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   125139L
>>> aClk.start(); len( str( Parallel( n_jobs =  1                        )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   124358L _ _ _ _ _ _ _ _ _

>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   213990L [PAR]   QUAD-CORE .threading
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   201337L
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   199485L
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   198174L
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   169204L
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   168658L
>>> aClk.start(); len( str( Parallel( n_jobs = -1, backend = 'threading' )( delayed( np.math.factorial ) ( 2**f ) for f in range( 14 ) ) [-1] ) ); aClk.stop()        28504   171793L _ _ _ _ _ _ _ _ _

>>> aClk.start(); len( str(                                                        [ np.math.factorial(    2**f ) for f in range( 14 ) ] [-1] ) ); aClk.stop()        28504   121401L [SEQ] SINGLE-CORE
                                                                                                                                                                              126381L
import aiohttp, asyncio, async_timeout
import time


async def get_url(session, url, corou_id=-1, timeout=10):
    start = time.time()
    res = ""
    # SECTION-UNDER-TEST
    with async_timeout.timeout(timeout):  # TIMEOUT-PROTECTED
        async with session.get(url) as response:
            while True:
                chunk = await response.content.read(1024)
                if not chunk:
                    break
                res += str(chunk)
            await response.release()
    end = time.time()
    runtime = end - start

    print(
        "Now finished get_url(launch_id:<{2: >2d}>) E2E execution took {0: >9d} [us](Safety timeout was set {1: >2d} [s])".format(
            runtime, timeout, corou_id))
    return runtime, len(res)


async def async_payload_wrapper(async_loop, number_of_tests=10, url="http://example.com"):
    '''
    urls = [ "https://www.irs.gov/pub/irs-pdf/f1040.pdf",
                        "https://www.forexfactory.com/news",
                         ...
                         ]
    '''
    async with aiohttp.ClientSession(loop=async_loop) as session:
        corou_to_execute = [get_url(session, url, launchID) for launchID in
                                              range(min(number_of_tests, 1000))]
        await asyncio.gather(*corou_to_execute)
if __name__ == '__main__':
    event_loop = asyncio.get_event_loop()
    event_loop.run_until_complete(async_payload_wrapper(event_loop, 25))