Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/278.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
即使元素存在,使用Python的BeautifulSoup也会保持返回null_Python_Beautifulsoup - Fatal编程技术网

即使元素存在,使用Python的BeautifulSoup也会保持返回null

即使元素存在,使用Python的BeautifulSoup也会保持返回null,python,beautifulsoup,Python,Beautifulsoup,我正在运行下面的代码,以使用Python中的Beauty soup解析amazon页面,但是当我运行打印行时,我始终没有得到任何结果。我想知道我是否做错了什么,或者对此是否有解释/解决方案。任何帮助都将不胜感激 import requests from bs4 import BeautifulSoup URL = 'https://www.amazon.ca/Magnetic-Erase-Whiteboard-Bulletin- Board/dp/B07GNV

我正在运行下面的代码,以使用Python中的Beauty soup解析amazon页面,但是当我运行打印行时,我始终没有得到任何结果。我想知道我是否做错了什么,或者对此是否有解释/解决方案。任何帮助都将不胜感激

    import requests
    from bs4 import BeautifulSoup

    URL = 'https://www.amazon.ca/Magnetic-Erase-Whiteboard-Bulletin- 
    Board/dp/B07GNVZKY2/ref=sr_1_3_sspa?keywords=whiteboard&qid=1578902710&s=office&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzOE5ZSkFGSDdCOFVDJmVuY3J5cHRlZElkPUEwMDM2ODA4M0dWMEtMWkI1U1hJJmVuY3J5cHRlZEFkSWQ9QTA0MDIwMjQxMEUwMzlMQ0pTQVlBJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

    headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}

    page = requests.get(URL, headers=headers)

    soup = BeautifulSoup(page.content, 'html.parser')

    title = soup.find(id="productTitle")

    print(title)

当在正常浏览器环境之外请求该页面时,它要求验证码,我认为这就是元素不存在的原因


亚马逊可能有特定的措施来阻止“机器人”访问他们的网页,我建议看看他们的API,看看是否有什么有用的东西,而不是直接抓取网页。

你的代码绝对正确。 您使用的解析器(html.parser)似乎存在一些问题

我使用html5lib代替html.parser,代码现在可以工作了:

import requests
from bs4 import BeautifulSoup

URL = 'https://www.amazon.ca/Magnetic-Erase-Whiteboard-BulletinBoard/dp/B07GNVZKY2/ref=sr_1_3_sspa?keywords=whiteboard&qid=1578902710&s=office&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUEzOE5ZSkFGSDdCOFVDJmVuY3J5cHRlZElkPUEwMDM2ODA4M0dWMEtMWkI1U1hJJmVuY3J5cHRlZEFkSWQ9QTA0MDIwMjQxMEUwMzlMQ0pTQVlBJndpZGdldE5hbWU9c3BfYXRmJmFjdGlvbj1jbGlja1JlZGlyZWN0JmRvTm90TG9nQ2xpY2s9dHJ1ZQ=='

headers = {"User-Agent": 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}

page = requests.get(URL, headers=headers)

soup = BeautifulSoup(page.content, 'html5lib')

title = soup.find(id='productTitle')

print(title)
与答案不直接相关的更多信息:

对于这个问题的另一个答案,我在访问页面时没有被要求验证码

但是,如果亚马逊检测到机器人正在访问网站,它会更改响应内容:从requests.get()方法中删除标题,然后尝试page.text

请求库添加的默认头导致将请求标识为正在形成bot