Python 3.x 正在寻找一种避免爬行时被禁止的方法_Python 3.x_Request_Instagram

Python 3.x 正在寻找一种避免爬行时被禁止的方法

python-3.x instagram

Python 3.x 正在寻找一种避免爬行时被禁止的方法,python-3.x,request,instagram,Python 3.x,Request,Instagram,我对页面做了很多请求https://www.instagram.com/explore/tags/some_hashtag/?__a=1在Python中。代码如下： def LoadUserAgents(uafile): """ uafile : string path to text file of user agents, one per line """ uas = [] with

我对页面做了很多请求

https://www.instagram.com/explore/tags/some_hashtag/?__a=1

在Python中。代码如下：

def LoadUserAgents(uafile):
    """
    uafile : string
        path to text file of user agents, one per line
    """
    uas = []
with open(uafile, 'rb') as uaf:
    for ua in uaf.readlines():
        if ua:
            uas.append(ua.strip())
random.shuffle(uas)
return uas

address = f'https://www.instagram.com/explore/tags/{hashtag[1:]}/?__a=1'
uas = LoadUserAgents("user-agents.txt")
ua = random.choice(uas)
headers = {
    "Connection" : "close",  
    "User-Agent" : ua}

r = requests.get(address, proxies=proxy, timeout=30, headers=headers)

文本文件“user agents.txt”来自

变量

proxy

的一个示例是

proxy={'http'：'http://104.196.45.252:80“}

但我仍然可以从日志中看到，我经常会被短期禁赛

{'message': 'Please wait a few minutes before you try again.', 'status': 'fail'}

在这样的禁止之后，我立即更改代理和用户代理，但是下面的请求也表明我被禁止了

[Crawler @ 17_07_2018_15h29m34s] 
Error message:{'message': 'Please wait a few minutes before you try again.', 'status': 'fail'} 
Proxy:{'http': 'http://104.196.45.252:80'}
Header: {'Connection': 'close', 'User-Agent': b'Mozilla/5.0 (Windows; U; Windows NT 5.0; fr; rv:1.8.1.9pre) Gecko/20071102 Firefox/2.0.0.9 Navigator/9.0.0.3'}

[Crawler @ 17_07_2018_15h29m44s]
Error message: {'message': 'Please wait a few minutes before you try again.', 'status': 'fail'} 
Proxy:{'http': 'http://52.77.242.220:80'} 
Header: {'Connection': 'close', 'User-Agent': b'Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.7.3) Gecko/20040910'}

....

有什么想法我做错了什么，或者我应该在那里添加什么来避免问题吗

谢谢大家!

尝试为https流量提供代理-目前您提供的代理未被使用。

有比某些ip或用户代理更复杂的方法来检测您是否是同一个人。。。像画布指纹或其他指纹方法不可用时。。。你能做什么？从python。。。没什么。对不起，我不确定我是否正确理解了你。你的意思是“http”代理不能用于此目的吗？我不知道为什么我没有任何评论就被否决了。在任何情况下，您都试图连接到https URL并为http提供代理。这些不一样，因此不会使用。