Python 使用BeautifulSoup-.下一个兄弟无法工作_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup-.下一个兄弟无法工作

python web-scraping

Python 使用BeautifulSoup-.下一个兄弟无法工作,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我试图在div之间刮取文本：我尝试使用。本文中提到的下一个兄弟姐妹：但它不起作用我当前的代码：对于介于1565和1566之间的页码：地址=https://dojrzewamy.pl/cat/3/nowe/%d/seks %页码 page=requests.getaddress soup=beautifulsoupage.content，“html.parser” containers=soup.findAlldiv，{class:question} 对于集装箱中的集装箱： h2=con

我试图在div之间刮取文本：

我尝试使用。本文中提到的下一个兄弟姐妹：

但它不起作用

我当前的代码：

对于介于1565和1566之间的页码：地址=https://dojrzewamy.pl/cat/3/nowe/%d/seks %页码 page=requests.getaddress soup=beautifulsoupage.content，“html.parser” containers=soup.findAlldiv，{class:question} 对于集装箱中的集装箱： h2=container.finddiv，{class:info}.findh2 content=container.finddiv，{class:info}.finddiv，{class:clear:both} desc=content.next\u同级 printdesc 你能帮我解释一下如何使用BeautifulSoup4访问div之间的文本吗？

class属性不在你正在搜索的第二个div中。该属性是style

您需要再提供一次检查来验证元素是否存在，然后查找下一个兄弟元素

现在试试

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.find("div", {"class": "info"}).find("div", {"style": "clear:both"})
        if content:
           desc = content.next_sibling
           print(desc)

这里有一些简单的css选择器选项

for pageNumber in range(1565, 1566):
    address = "https://dojrzewamy.pl/cat/3/nowe/%d/seks" % pageNumber
    page = requests.get(address)
    soup = BeautifulSoup(page.content, 'html.parser')
    containers = soup.findAll("div", {"class": "question"})
    for container in containers:
        h2 = container.find("div", {"class": "info"}).find("h2")
        content = container.select_one("div[style='clear:both']")
        if content:
           desc = content.next_sibling
           print(desc)

好的，我找到了另一个解决方案：

对于介于1565和1566之间的页码：地址=https://dojrzewamy.pl/cat/3/nowe/%d/seks %页码 page=requests.getaddress soup=beautifulsoupage.content，“html.parser” containers=soup.findAlldiv，{class:question} 对于集装箱中的集装箱： h2=container.finddiv，{class:info}.findh2 info=container.finddiv，{class:info} printinfotext=True，recursive=False