Encoding 在python3.5中使用proxybroker会引发编码错误

Encoding 在python3.5中使用proxybroker会引发编码错误,encoding,proxy,python-3.5,broker,aiohttp,Encoding,Proxy,Python 3.5,Broker,Aiohttp,我试图使用生成一个文件,其中包含某些国家/地区的活动代理。我在尝试获取代理时总是遇到相同的错误。该错误似乎是proxbroker使用的packe中的编码/解码错误。但我怀疑可能有更好的方法使用proxybroker 这是导致问题的代码: def gather_proxies(countries): """ This method uses the proxybroker package to asynchronously get two new proxies pe

我试图使用生成一个文件,其中包含某些国家/地区的活动代理。我在尝试获取代理时总是遇到相同的错误。该错误似乎是proxbroker使用的packe中的编码/解码错误。但我怀疑可能有更好的方法使用proxybroker

这是导致问题的代码:

def gather_proxies(countries):
"""
This method uses the proxybroker package to asynchronously get two new proxies per specified country
and returns the proxies as a list of country and proxy.

:param countries: The ISO style country codes to fetch proxies for. Countries is a list of two letter strings.
:return: A list of proxies that are themself a list with  two paramters[Location, proxy address].
"""
proxy_list = []
types = ['HTTP']
for country in countries:
    loop = asyncio.get_event_loop()

    proxies = asyncio.Queue(loop=loop)
    broker = Broker(proxies, loop=loop,)

    loop.run_until_complete(broker.find(limit=2, countries=country, types=types))

    while True:
        proxy = proxies.get_nowait()
        if proxy is None:
            break
        print(str(proxy))
        proxy_list.append([country, proxy.host + ":" + str(proxy.port)])
return proxy_list
以及错误消息:

../app/main/download_thread.py:344: in update_proxies
proxy_list = gather_proxies(country_list)
../app/main/download_thread.py:368: in gather_proxies
    loop.run_until_complete(broker.find(limit=2, countries=country, types=types))
/usr/lib/python3.5/asyncio/base_events.py:387: in run_until_complete
    return future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
    raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:241: in _step
    result = coro.throw(exc)
../venv/lib/python3.5/site-packages/proxybroker/api.py:108: in find
    await self._run(self._checker.check_judges(), action)
../venv/lib/python3.5/site-packages/proxybroker/api.py:114: in _run
    await tasks
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__
    yield self  # This tells Task to wait for completion.
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup
    future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
    raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:241: in _step
    result = coro.throw(exc)
../venv/lib/python3.5/site-packages/proxybroker/checker.py:26: in check_judges
    await asyncio.gather(*[j.check() for j in self._judges])
/usr/lib/python3.5/asyncio/futures.py:361: in __iter__
    yield self  # This tells Task to wait for completion.
/usr/lib/python3.5/asyncio/tasks.py:296: in _wakeup
    future.result()
/usr/lib/python3.5/asyncio/futures.py:274: in result
    raise self._exception
/usr/lib/python3.5/asyncio/tasks.py:239: in _step
    result = coro.send(None)
../venv/lib/python3.5/site-packages/proxybroker/judge.py:62: in check
    page = await resp.text()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <ClientResponse(http://ip.spys.ru/) [200 OK]>
<CIMultiDictProxy('Date': 'Thu, 18 Aug 2016 11:02:53 GMT', 'Server': 'Ap...': 'no-cache', 'Vary': 'Accept-Encoding', 'Transfer-Encoding': 'chunked', 'Content-Type': 'text/html; charset=UTF-8')>

encoding = 'utf-8'

    @asyncio.coroutine
    def text(self, encoding=None):
        """Read response payload and decode."""
        if self._content is None:
            yield from self.read()
    
        if encoding is None:
            encoding = self._get_encoding()
    
>       return self._content.decode(encoding)
E       UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 5568: invalid continuation byte

../venv/lib/python3.5/site-packages/aiohttp/client_reqrep.py:758: UnicodeDecodeError
。/app/main/download\u-thread.py:344:in-update\u-proxies
代理列表=收集代理(国家/地区列表)
../app/main/download\u thread.py:368:in-gather\u代理
loop.run_直到_完成(broker.find(limit=2,countries=country,types=types))
/usr/lib/python3.5/asyncio/base\u events.py:387:运行到完成
返回future.result()
/usr/lib/python3.5/asyncio/futures.py:274:结果中
提出自己的意见
/usr/lib/python3.5/asyncio/tasks.py:241:in\u step
结果=核心投掷(exc)
../venv/lib/python3.5/site-packages/proxybroker/api.py:108:in-find
等待自我运行(自我检查。检查判断(),操作)
../venv/lib/python3.5/site packages/proxybroker/api.py:114:in\u run
等待任务
/usr/lib/python3.5/asyncio/futures.py:361:in\uu iter__
屈服自我——这告诉任务等待完成。
/usr/lib/python3.5/asyncio/tasks.py:296:in\u唤醒
future.result()
/usr/lib/python3.5/asyncio/futures.py:274:结果中
提出自己的意见
/usr/lib/python3.5/asyncio/tasks.py:241:in\u step
结果=核心投掷(exc)
../venv/lib/python3.5/site packages/proxybroker/checker.py:26:in check\u
等待asyncio.gather(*[j.check(),在self.\u中检查j])
/usr/lib/python3.5/asyncio/futures.py:361:in\uu iter__
屈服自我——这告诉任务等待完成。
/usr/lib/python3.5/asyncio/tasks.py:296:in\u唤醒
future.result()
/usr/lib/python3.5/asyncio/futures.py:274:结果中
提出自己的意见
/usr/lib/python3.5/asyncio/tasks.py:239:in\u step
结果=coro.send(无)
../venv/lib/python3.5/site packages/proxybroker/judge.py:62:正在检查中
页面=等待响应文本()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
自我=
编码='utf-8'
@异步协同程序
def文本(自编码=无):
“”“读取响应负载并解码。”“”
如果self.\u内容为无:
从self.read()中获得收益
如果编码为“无”:
encoding=self.\u get\u encoding()
>返回self.\u内容.解码(编码)
E UnicodeDecodeError:“utf-8”编解码器无法解码位置5568中的字节0xd6:无效的连续字节
../venv/lib/python3.5/site packages/aiohttp/client_requrep.py:758:UnicodeDecodeError
问题似乎在proxybroker或更确切地说是aiohttp包中。但由于它被认为是一个经过测试的包,所以问题可能是我的代码

有人能看出我做错了什么,或者有人对proxybroker的使用有什么建议吗?

问题还在等待解决。 它以文本形式检索html页面。 aiohttp尝试使用
chardet
库确定正确的编码,但对于格式错误的页面,这是不可能的


我认为
resp.text()
应该被
resp.read()
替换为
resp.read()
,以便在不解码到
str
的情况下将页面提取为
bytes
,谢谢,我在proxybroker中提交了一个问题!这似乎就是问题所在。如果我将resp.text()更改为resp.read(),然后得到一个bytes对象而不是字符串,我必须在某个地方将其转换为字符串。但是这种转换总是会抛出解码错误,因为响应中有一个字节是无法读取的,对吗?因为broxybroker只需要“latin1”就足够了。它永远不会失败。