Python HTTP请求被阻止_Python_Html_Regex_Httprequest

Python HTTP请求被阻止

python html regex

Python HTTP请求被阻止,python,html,regex,httprequest,Python,Html,Regex,Httprequest,我一直试图通过一个请求，其中第一页是一个数学演算，要传递到主页。这一部分已经解决。然而，当我试图获得其他东西时，我得到以下结果： <script> window.location.reload(); </script> window.location.reload（）；这个方法我学了一段时间，但直到现在我才第一次尝试： import re import requests def login_tokyo(s): r = s.get('https://ap

我一直试图通过一个请求，其中第一页是一个数学演算，要传递到主页。这一部分已经解决。然而，当我试图获得其他东西时，我得到以下结果：

<script>
window.location.reload();
</script>


window.location.reload（）；

这个方法我学了一段时间，但直到现在我才第一次尝试：

import re
import requests


def login_tokyo(s):
    r = s.get('https://apcis.tmou.org/public/')
    str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
    numbers = re.findall('[0-9]+', str_number)
    captcha = int(numbers[0]) + int(numbers[1])
    payload = {'captcha': captcha}
    r = s.post('https://apcis.tmou.org/public/?action=login', data=payload)
    check_text = re.findall('<b>(.*?)</b>', r.text)[0]
    print(check_text)
    payload1 = {'Param': 0, 'Value': 5797164, 'imo': '', 'callsign': '', 'name': '', 'compimo': 5797164,
                'compname': '', 'From': '01.06.2020', 'Till': '31.08.2020', 'authority': 0, 'flag': 0, 'class': 0,
                'ro': 0, 'type': 0, 'result': 0, 'insptype': -1, 'sort1': 0, 'sort2': 'DESC', 'sort3': 0,
                'sort4': 'DESC'
                }
    r = s.post('https://apcis.tmou.org/public/?action=getcompanies', data=payload1)
    perf_tm = re.findall("<p class=[^>]+(.*?)</p>", r.text)
    print(r.text)
    print(perf_tm)

if __name__ == '__main__':
    with requests.Session() as s:
        login_tokyo(s)

重新导入
导入请求
东京大学：
r=s.get（'https://apcis.tmou.org/public/')
str_number=re.findall（“]+（.*）”，r.text）[0]
数字=关于findall（'[0-9]+'，str_数字）
验证码=int（数字[0]）+int（数字[1]）
有效负载={'captcha'：captcha}
r=s.post（'https://apcis.tmou.org/public/?action=login，数据=有效载荷）
检查_text=re.findall（“（.*？”，r.text）[0]
打印（检查文本）
payload1={'Param'：0，'Value'：5797164，'imo'：''，'callsign'：''，'name'：''，'compimo'：5797164，
“compname”：“From”：“01.06.2020”，“Till”：“31.08.2020”，“authority”：0，“flag”：0，“class”：0，
“ro”：0，“类型”：0，“结果”：0，“insptype”：-1，“sort1”：0，“sort2”：“DESC”，“sort3”：0，
“sort4”：“DESC”
}
r=s.post（'https://apcis.tmou.org/public/?action=getcompanies，data=payload1）
perf_tm=re.findall（“]+（.*？”，r.text）
打印（右文本）
打印（性能tm）
如果uuuu name uuuuuu='\uuuuuuu main\uuuuuuu'：
将requests.Session（）作为s：
东京(s)

打印（检查文本）

告诉我我在主页上，但是。。。没有什么。从这个特定的请求中，我希望

打印（perf_tm）

能让我达到中等水平。感谢所有的帮助

编辑：

不管怎样，我错了，会话应该处理所有cookie，似乎网站拒绝来自borwsers的请求而没有用户代理，只需执行以下操作：


def login_tokyo(s):
    header={'User-Agent':''}
    s.headers.update(header)
    r = s.get('https://apcis.tmou.org/public/')
    str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
    ...

如果您不想手动处理Cookie，您也可以使用（本文中的更多信息），尽管大多数服务器不关心sessionid集是否已在其数据库中，并且会接受任何随机sessionid。

您需要获取哪些信息？我喜欢网站上的数学问题验证码，它们和拼写一样有效。我正在努力提高公司的性能@ZarakiKenpachibtw我强烈建议使用邮递员/失眠症手动测试请求，并检查哪些头是必需的，哪些不是。如果你像我一样懒惰，只需将浏览器中“网络”选项卡中的请求复制为cURL请求并粘贴到此处：这就像一个符咒。感谢您确定问题所在。我一定会更多地了解它，以便更好地理解它。


def login_tokyo(s):
    header={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.','Cookies':'PHPSESSID=xxxxxxxxxxxxxxxxxxxxxxxxxxxx'}
    s.headers.update(header)
    r = s.get('https://apcis.tmou.org/public/')
    str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
    ...