如何在python beautifulsoup中获取交替子标记_Python_Python 3.x_Beautifulsoup

如何在python beautifulsoup中获取交替子标记

python python-3.x

如何在python beautifulsoup中获取交替子标记,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我试图从html页面中的交替标记中获取一系列数据。 html如下所示：标题文本标题文本 ... 由于我无法在“for each pair in div”中抓取每个h3/div对，因此如何有效地抓取它们？有很多方法可以做到这一点，但对我来说最简单的方法是选择所有的h3标记，然后遍历DOM以获取它们的下一个兄弟。找到所有标题，然后从那里抓取： for header in soup.select('div h3'): next_div = header.find_next_sibl

我试图从html页面中的交替标记中获取一系列数据。 html如下所示：


标题
文本
标题
文本
...

由于我无法在“for each pair in div”中抓取每个h3/div对，因此如何有效地抓取它们？

有很多方法可以做到这一点，但对我来说最简单的方法是选择所有的

h3

标记，然后遍历DOM以获取它们的下一个兄弟。

找到所有标题，然后从那里抓取：

for header in soup.select('div h3'):
    next_div = header.find_next_sibling('div')

element.find\u next\u sibling（）

返回一个元素，如果找不到这样的同级元素，则返回

None

演示：

>>来自bs4导入组
>>>汤=美汤（“”）\
... 
…第一个标题
…第一个带标题的div
…第二个标题
…第二个div与标题一起
... 
... ''')
>>>对于汤中的标题。选择（'div h3'）：
...     next\u div=标题。查找\u next\u同级（'div'））
...     打印（header.text，next_div.text）
... 
第一个收割台与收割台一起的第一个div
第二个收割台与收割台配合的第二个div

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''\
... <div>
...     <h3>First header</h3>
...     <div>First div to go with a header</div>
...     <h3>Second header</h3>
...     <div>Second div to go with a header</div>
... </div>
... ''')
>>> for header in soup.select('div h3'):
...     next_div = header.find_next_sibling('div')
...     print(header.text, next_div.text)
... 
First header First div to go with a header
Second header Second div to go with a header