Python 查找下一个出现的标记及其包含的文本_Python_Html_Python 2.7_Beautifulsoup

Python 查找下一个出现的标记及其包含的文本

python html python-2.7

Python 查找下一个出现的标记及其包含的文本,python,html,python-2.7,beautifulsoup,Python,Html,Python 2.7,Beautifulsoup,我正在尝试解析标记之间的文本。当我键入soup.blockquote.get_text（）时我得到了HTML文件中第一个出现的blockquote的结果。如何在文件中找到下一个连续的标记？也许我只是太累了，在文档中找不到它 HTML文件示例： <html> <head>header </head> <blockquote>I can get this text </blockquote> <p>eiaoiefj</p

我正在尝试解析标记

之间的文本。当我键入

soup.blockquote.get_text（）

时

我得到了HTML文件中第一个出现的blockquote的结果。如何在文件中找到下一个连续的

标记？也许我只是太累了，在文档中找不到它

HTML文件示例：

<html>
<head>header
</head>
<blockquote>I can get this text
</blockquote>
<p>eiaoiefj</p>
<blockquote>trying to capture this next
</blockquote>
<p></p><strong>do not capture this</strong>
<blockquote>
capture this too but separately after "capture this next"
</blockquote>
</html>

使用（如果不是兄弟姐妹，则改用）

>html=''
... 
... 标题
... 
... 废话
... 
...  eiaoiefj
... 下一步抓拍这个
... 
...  
不要患尿道溃疡
... 
... 也捕获此内容，但在“捕获下一个”之后单独捕获
... 
... 
... '''
>>>从bs4导入BeautifulSoup
>>>soup=BeautifulSoup（html）
>>>quote1=soup.blockquote
>>>引用1.text
诸如此类的废话\n
>>>quote2=quote1.查找“下一个”兄弟姐妹（“块引号”）
>>>引用2.text
您'捕获下一个\n'

您指的是什么，是HTML==。如果是的话，那么一个HTML是否需要比其他HTML标记更特殊的处理呢？我不知道，所以我留下这个评论来澄清这一点。bs4或任何其他类型的HTML解析代码（适用于“其他HTML标记”）适用于HTML，谢谢。

from bs4 import BeautifulSoup

html_doc = open("example.html")
soup = BeautifulSoup(html_doc)
print.(soup.blockquote.get_text())
# how to get the next blockquote???

>>> html = '''
... <html>
... <head>header
... </head>
... <blockquote>blah blah
... </blockquote>
... <p>eiaoiefj</p>
... <blockquote>capture this next
... </blockquote>
... <p></p><strong>don'tcapturethis</strong>
... <blockquote>
... capture this too but separately after "capture this next"
... </blockquote>
... </html>
... '''

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> quote1 = soup.blockquote
>>> quote1.text
u'blah blah\n'
>>> quote2 = quote1.find_next_siblings('blockquote')
>>> quote2.text
u'capture this next\n'