Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/15.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用aiohttp/async检查代理列表_Python_Python 3.x_Asynchronous_Python Asyncio_Aiohttp - Fatal编程技术网

Python 使用aiohttp/async检查代理列表

Python 使用aiohttp/async检查代理列表,python,python-3.x,asynchronous,python-asyncio,aiohttp,Python,Python 3.x,Asynchronous,Python Asyncio,Aiohttp,我正在尝试使用aiohttp和async来获取代理列表,并检查它们是否与特定的URL(->状态代码200)一起工作。如果是这样,我想将它们添加到一个新的工作代理列表中。我以前在请求中这样做过,它工作得很好,但速度非常慢,因此我尝试使用异步方法使其工作。当我使刮片部分工作时,我无法使检查部分运行: from bs4 import BeautifulSoup import random import asyncio import aiohttp URL1 = 'https://free-proxy

我正在尝试使用aiohttp和async来获取代理列表,并检查它们是否与特定的URL(->状态代码200)一起工作。如果是这样,我想将它们添加到一个新的工作代理列表中。我以前在请求中这样做过,它工作得很好,但速度非常慢,因此我尝试使用异步方法使其工作。当我使刮片部分工作时,我无法使检查部分运行:

from bs4 import BeautifulSoup
import random
import asyncio
import aiohttp

URL1 = 'https://free-proxy-list.net/'
URL2 = 'https://api.proxyscrape.com/v2/?request=getproxies&protocol=http&timeout=2900&country=all&ssl=all&anonymity=elite&simplified=true'


async def proxy_db():
    async with aiohttp.ClientSession() as session:
        async with session.get(URL1) as resp1:
            text1 = await resp1.read()
            soup1 = BeautifulSoup(text1.decode('utf-8'), 'html.parser')
            proxy_list_fpl = []
            for items1 in soup1.select("#proxylisttable tbody tr"):
                proxy_list_fpl.append(':'.join([item.text for item in items1.select("td")[:2]]))
            print(len(proxy_list_fpl))
        async with session.get(URL2) as resp2:
            text2 = await resp2.read()
            soup2 = BeautifulSoup(text2.decode('utf-8'), 'html.parser')
            proxy_list_ps = []
            for items2 in soup2:
                proxy_list_ps = items2.split()
            print(len(proxy_list_ps))
    templist = list(set(proxy_list_fpl + proxy_list_ps))
    proxy_list = ["http://" + s for s in templist]
    print(len(proxy_list))
    return proxy_list

loop = asyncio.get_event_loop()
proxies = loop.run_until_complete(proxy_db())
print(proxies)
loop.close()

# Until here it works fine. Im new to python and asyncio, so there might be a more 
# efficient way of coding this, however it already saved 50% time compared to my requests method before


working_proxy = []


async def fetch(session, url, proxy):
    async with session.get(url, proxy = proxies) as response:
        if response.status != 200:
            response.raise_for_status()
        return await response.text

async def fetch_all(session, url, proxy):
    tasks = []
    for proxy in proxies:
        task = asyncio.create_task(fetch(session, url, proxy))
        tasks.append(task)
    results = await asyncio.gather(*tasks)
    return results

async def main():
    url = "http://httpbin.org/ip"
    proxy = proxies
    async with aiohttp.ClientSession() as session:
        page = await fetch_all(session, url, proxy)
        if page.status == 200:
            working_proxy.append(proxies)
            print(len(working_proxy))

if __name__ == "__main__":
    asyncio.run(main())
导致:

Traceback (most recent call last):
  File "/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py", line 179, in <module>
    asyncio.run(main())
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py", line 173, in main
    page = await fetch_all(session, url, proxy)
  File "/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py", line 166, in fetch_all
    results = await asyncio.gather(*tasks)
  File "/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py", line 156, in fetch
    async with session.get(url, proxy = proxies) as response:
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/aiohttp/client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/aiohttp/client.py", line 415, in _request
    proxy = URL(proxy)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yarl/_url.py", line 158, in __new__
    raise TypeError("Constructor parameter should be str")
TypeError: Constructor parameter should be str

Process finished with exit code 1
回溯(最近一次呼叫最后一次):
文件“/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py”,第179行,在
asyncio.run(main())
文件“/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/runners.py”,第44行,运行中
返回循环。运行直到完成(主)
文件“/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base\u events.py”,第642行,运行\u直到完成
返回future.result()
文件“/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py”,主文件第173行
page=wait fetch_all(会话、url、代理)
文件“/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py”,第166行,在fetch_all中
结果=等待asyncio.gather(*任务)
文件“/Users/xxx/Dropbox/Python/5APR/Web_Scraping/asyncio_test.py”,第156行,在fetch中
与session.get(url,proxy=proxies)异步作为响应:
文件“/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site packages/aiohttp/client.py”,第1117行,在__
self.\u resp=等待self.\u coro
文件“/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site packages/aiohttp/client.py”,第415行,在请求中
proxy=URL(代理)
文件“/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/yarl/_-url.py”,第158行,在新的__
raise TypeError(“构造函数参数应为str”)
TypeError:构造函数参数应为str
进程已完成,退出代码为1

我将非常感谢任何关于如何运行的想法或提示。我对Python和一般的编码都是新手,所以我也很高兴看到关于糟糕的实践/风格或更有效的编码方式的任何提示。提前谢谢你们

我会盲目地说,
proxy=proxies
是问题所在。此函数只接受一个代理,而不是多个代理。这就是为什么会出现错误;它会得到一个完整的代理列表,而不仅仅是代表一个代理的字符串。@MikaelÖhman感谢您的回复!我更改了固定代理地址的代理列表,不幸的是仍然导致了相同的错误。您已经验证了您传入的单个代理实际上是一个
str
,而不仅仅是一个仅包含1个元素的列表?