Python 我想提取html中介于给定条件之间的部分_Python_Beautifulsoup

Python 我想提取html中介于给定条件之间的部分

python

Python 我想提取html中介于给定条件之间的部分,python,beautifulsoup,Python,Beautifulsoup,我有一个很长的html文件，我想提取位于给定条件之间的html的一部分 <div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="justify"> <font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold"> <f

我有一个很长的html文件，我想提取位于给定条件之间的html的一部分

<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="justify">
<font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold">
<font style="DISPLAY: inline; TEXT-DECORATION: underline">ITEM 1A. RISK FACTORS</font></font></div>

    ---
    ---
    ---
    ---
<div style="TEXT-INDENT: 0pt; DISPLAY: block; MARGIN-LEFT: 0pt; MARGIN-RIGHT: 0pt" align="justify">
<font style="DISPLAY: inline; FONT-FAMILY: Times New Roman; FONT-SIZE: 12pt; FONT-WEIGHT: bold">
<font style="DISPLAY: inline; TEXT-DECORATION: underline">ITEM 1B. UNRESOLVED STAFF COMMENTS</font></font></div>

您可以在for循环外使用一个布尔值来跟踪是否要打印行。比如：

page_soup = soup(page_html, "html.parser")

should_print = False
for item in page_soup.find_all('font'):
    if "ITEM 1A. RISK FACTORS" in item.text:
            should_print = True
    if "ITEM 1B. UNRESOLVED STAFF COMMENTS" in item.text:
            break
    if should_print:
            print(item)

您可以在for循环外使用一个布尔值来跟踪是否要打印行。比如：

page_soup = soup(page_html, "html.parser")

should_print = False
for item in page_soup.find_all('font'):
    if "ITEM 1A. RISK FACTORS" in item.text:
            should_print = True
    if "ITEM 1B. UNRESOLVED STAFF COMMENTS" in item.text:
            break
    if should_print:
            print(item)