Javascript Python BeautifulSoup html.parser不工作_Javascript_Python_Beautifulsoup_Html Parsing

Javascript Python BeautifulSoup html.parser不工作

javascript python

Javascript Python BeautifulSoup html.parser不工作,javascript,python,beautifulsoup,html-parsing,Javascript,Python,Beautifulsoup,Html Parsing,我有一个脚本，可以从Amazon上获取图书信息，它以前运行成功，但今天失败了。我无法准确地找出到底出了什么问题，但我假设它与解析器或Javascript相关。我正在使用下面的代码 from bs4 import BeautifulSoup import requests response = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords

我有一个脚本，可以从Amazon上获取图书信息，它以前运行成功，但今天失败了。我无法准确地找出到底出了什么问题，但我假设它与解析器或Javascript相关。我正在使用下面的代码

from bs4 import BeautifulSoup
import requests

response = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=9780307397980',headers={'User-Agent': b'Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0'})
html = response.content
soup = BeautifulSoup(html, "html.parser")
resultcol = soup.find('div', attrs={'id':'resultsCol'})

以前我在

resultcol

中获取数据，但现在它是空的。当我选中

html

时，我看到了我正在寻找的标记，即

。但是

soup

中没有此文本。有人能帮我调试一下吗？它以前工作得非常好，但现在不行。

您需要等待页面完全加载。您必须使用

phantomJs

来确保页面正确加载

我能够通过以下代码获得正确的元素

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

url = ("https://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3D"
       "stripbooks&field-keywords=9780307397980")

browser = webdriver.PhantomJS()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, 'lxml')
resultcol = soup.find('img', attrs={'class': 's-access-image'})
print resultcol

删除标题，它应该可以工作

from bs4 import BeautifulSoup
import requests
response = requests.get('https://www.amazon.com/s/ref=nb_sb_noss?url=search-    alias%3Dstripbooks&field-keywords=9780307397980')
html = response.content
soup = BeautifulSoup(html, "html.parser")
resultcol = soup.find('div', attrs={'id':'resultsCol'})`

真奇怪！我运行了这段代码，但我忽略了标题，因为我已经有一段时间没有使用请求了，我不记得所有的规范。一切似乎都很顺利！也许试着在不通过标题的情况下运行它？您的标题可能有错误。仍然不起作用。头球看起来不错。我在“html”元素中获得了正确的信息“soup”没有捕获所有代码。我发现以下错误

AttributeError:“bytes”对象没有属性“parser”

“AttributeError:“NoneType”对象没有属性“foo”-这通常是因为您调用了find（），然后试图访问结果的.foo`属性。但在您的情况下，find（）没有找到任何内容，因此它返回None，而不是返回标记或字符串。您需要弄清楚为什么find（）调用没有返回任何内容。”从BS4文档中，我认为问题在于'div'通过此方法得到的结果与我在chrome的

查看源代码中看到的结果不一样。我脚本的另一部分需要为resultcol.find_all（'a'）：if url['href'].endswith（'keywords='+str（ISBN_搜索））：if url['href']中的'dp'：sub_link.append（url['href']）链接列表中的url查找以keywords=97803073977980
结尾的链接[link_search-1]=url['href']break
@NeilS您可以在此处优化元素搜索。我无法找到包含该元素的URL。我可以在Chrome上看到它，但不能在soup
@NeilS检查所需元素的更新代码。您将需要一个正则表达式来获取URL。如果我以前不清楚，我很抱歉，但我正在查找此URL脚本中的墨迹：https://www.amazon.com/Year-Flood-MaddAddam-Trilogy/dp/030739798X/ref=sr_1_1?ie=UTF8&；qid=1536785536&；sr=8-1&；keywords=978030737980
…如果打开链接https://www.amazon.com/s/ref=nb_sb_noss?url=search-chrome上的别名%3Dstripbooks&field keywords=978030737980
，并查看源代码（ctrl+u）并转到第1419行（不知道对你来说也是一样），我可以看到这个链接。在我们刚刚解析的soup
中，我看不到这部分。你知道为什么吗？