Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么请求模块加载的内容与我的浏览器不同?_Python_Python 3.x_Http_Beautifulsoup_Python Requests - Fatal编程技术网

Python 为什么请求模块加载的内容与我的浏览器不同?

Python 为什么请求模块加载的内容与我的浏览器不同?,python,python-3.x,http,beautifulsoup,python-requests,Python,Python 3.x,Http,Beautifulsoup,Python Requests,我有以下Python代码: req=requests.get("https://pythonhow.com/example.html") content=req.content soup=BeautifulSoup(content, "html.parser") all=soup.find_all(attrs={"class": "cities"}) 当我将此URL粘贴到浏览器中时,我得到了预期的标记结构: <body data-gr-c-s-loaded="true" cz-short

我有以下Python代码:

req=requests.get("https://pythonhow.com/example.html")
content=req.content
soup=BeautifulSoup(content, "html.parser")
all=soup.find_all(attrs={"class": "cities"})
当我将此URL粘贴到浏览器中时,我得到了预期的标记结构:

<body data-gr-c-s-loaded="true" cz-shortcut-listen="true">
    <h1 align="center"> Here are three big cities </h1>
    <div class="cities">
        <h2>London</h2>
        <p>London is the capital of England and it's been a British settlement since 2000 years ago. </p>
    </div>
    <div class="cities">
        <h2>Paris</h2>
        <p>Paris is the capital city of France. It was declared capital since 508.</p>
    </div>
    <div class="cities">
        <h2>Tokyo</h2>
        <p>Tokyo is the capital of Japan and one of the most populated cities in the world.</p>
    </div>
</body>

为什么
请求
获取的内容与我的浏览器不同?我怀疑这与某些请求头有关,但我不知道从何处开始。

您收到的错误是“请原谅我们的打扰。您的浏览器的某些内容使我们认为您是一个机器人”。这意味着不允许进行刮削,他们的网页上有反刮削机器人。你需要添加
标题
。你可以试试:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0 '}
req=requests.get("https://pythonhow.com/example.html", headers=headers)
content=req.content
soup=BeautifulSoup(content, "html.parser")
all=soup.find_all(attrs={"class": "cities"})

这一切都是关于
标题的
,网站在这里验证您针对机器人的请求

正如您所看到的,在您的请求中,
Mod_Security
web应用程序防火墙(WAF)阻止了它。因此,您可以只添加标题并发送
GET
请求。另外,您应该始终通过打印请求变量
r
来检查响应,以查看第一个请求发生了什么

导入请求
从bs4导入BeautifulSoup
标题={
“用户代理”:“Mozilla/5.0(Windows NT 10.0;Win64;x64;rv:76.0)Gecko/20100101 Firefox/76.0”
}
def主(url):
r=requests.get(url,headers=headers)
印刷品(r)
soup=BeautifulSoup(r.content'html.parser')
打印(soup.prettify())
主要(”https://pythonhow.com/example.html")
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT x.y; Win64; x64; rv:10.0) Gecko/20100101 Firefox/10.0 '}
req=requests.get("https://pythonhow.com/example.html", headers=headers)
content=req.content
soup=BeautifulSoup(content, "html.parser")
all=soup.find_all(attrs={"class": "cities"})