Python 如何从<；span>；嵌套在<；李>；嵌套在<；ul>；使用BeautifulSoup？_Python_Html_Web Scraping_Beautifulsoup

Python 如何从<；span>；嵌套在<；李>；嵌套在<；ul>；使用BeautifulSoup？

python html web-scraping

Python 如何从<；span>；嵌套在<；李>；嵌套在<；ul>；使用BeautifulSoup？,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我想摘录以下是部分的最新内容，从接下来的几周开始，到全面增强结束检查代码我看到嵌套在下，然后嵌套在

我想摘录以下是部分的最新内容，从接下来的几周开始，到全面增强结束
检查代码我看到
嵌套在下，然后嵌套在。在过去的几天里，我试图用Python3和BeautifulSoup 来提取它，但没有成功。我正在粘贴我在下面尝试过的代码有人能帮我指引正确的方向吗一,# 二,# 理想情况下，代码应该返回：在接下来的几周里，你只需点击“出发前”对话框，就可以阅读自己拥有的物品性能改进、错误修复和其他常规增强但他们都没给我什么。看起来它找不到具有该ID的ul ，但如果您打印（汤）一切看起来都很好： <ul id="GUID-8B03C49D-3A98-45F1-9128-392E55823F61__UL_E0490B159DE04E22AD519CE2E7D7A35B"> <li> Read Now: In the coming weeks, you will be able to read items that you own with a single click from the �Before You Go� dialog.</li> <li> Performance improvements, bug fixes, and other general enhancements. </li> </ul> 立即阅读：在接下来的几周里，您只需从�在你走之前� 对话性能改进、错误修复和其他常规增强功能。首先，页面是动态呈现的，因此您必须使用selenium 来正确获取页面内容第二，你可以找到p 标签，这里的新内容出现在这里，最后得到下一个ul 标签代码如下： from bs4 import BeautifulSoup as soup from selenium import webdriver url = "https://www.amazon.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=G54HPVAW86CHYHKS" driver = webdriver.Firefox() page = driver.get(url) html = soup(driver.page_source, 'html.parser') for p in html.find_all('p'): if p.text and "Here’s what’s new" in p.text: ul = p.find_next_sibling('ul') for li in ul.find_all('li'): print(li.text) 输出： Read Now: In the coming weeks, you will be able to read items that you own with a single click from the ‘Before You Go’ dialog. Performance improvements, bug fixes, and other general enhancements. 对于bs4 4.7.1+，您可以使用：contains和：has来隔离 import requests from bs4 import BeautifulSoup as bs r = requests.get('https://www.amazon.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=G54HPVAW86CHYHKS') soup = bs(r.content, 'lxml') text = [i.text.strip() for i in soup.select('p:has(strong:contains("Here’s what’s new:")), p:has(strong:contains("Here’s what’s new:")) + p + ul li')] print(text) 目前，您还可以删除：contains text = [i.text.strip() for i in soup.select('p:has(strong), p:has(strong) + p + ul li')] print(text) +是一个css相邻兄弟组合符。阅读更多。引述：相邻兄弟组合子 +组合符选择相邻的同级。这意味着第二个元素直接跟随首先，两者共享同一父项语法：A+B 示例：h2+p 将匹配直接跟在后面的所有元素我相信您的解决方案是可以的，但是由于selenium及其驱动程序存在许多问题，我无法让它在我的机器上运行。 import requests from bs4 import BeautifulSoup as bs r = requests.get('https://www.amazon.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=G54HPVAW86CHYHKS') soup = bs(r.content, 'lxml') text = [i.text.strip() for i in soup.select('p:has(strong:contains("Here’s what’s new:")), p:has(strong:contains("Here’s what’s new:")) + p + ul li')] print(text) text = [i.text.strip() for i in soup.select('p:has(strong), p:has(strong) + p + ul li')] print(text)