Python:在解析html代码时跳过行并去掉空格

Python:在解析html代码时跳过行并去掉空格,python,html,string,beautifulsoup,html-parsing,Python,Html,String,Beautifulsoup,Html Parsing,我有以下html代码: html_doc = """ <h2> API guidance for developers</h2> <h2>Images</h2> <h2>Score descriptors</h2> <h2>Downloadable XML data files (updated daily)</h2> <h2>

我有以下
html
代码:

html_doc = """
<h2> API guidance for developers</h2>
<h2>Images</h2>
<h2>Score descriptors</h2>
<h2>Downloadable XML data files (updated daily)</h2>
<h2>
                                    East Counties</h2>
<h2>
                                    East Midlands</h2>
<h2>
                                    London</h2>
<h2>
                                    North East</h2>
<h2>
                                    North West</h2>
<h2>
                                    South East</h2>
<h2>
                                    South West</h2>
<h2>
                                    West Midlands</h2>
<h2>
                                    Yorkshire and Humberside</h2>
<h2>
                                    Northern Ireland</h2>
<h2>
                                    Scotland</h2>
<h2>
                                    Wales</h2>
"""
预期结果:

East Counties
East Midlands
London
North East
...

我做错了什么?

您可以在这里使用
切片
,因为
find\u all
返回一个列表类型,这样您就可以处理它的索引,如
[4::
和忽略空白使用
strip()


您可以在这里使用
slicing
,因为
find\u all
返回一个列表类型,这样您就可以处理它的索引,如
[4:
和忽略空白使用
strip()


为什么你不能直接使用
soup.find_all('h2')[4://code>?为什么你不能直接使用
soup.find_all('h2')[4://code>?
East Counties
East Midlands
London
North East
...
for h2 in soup.find_all('h2')[4:]:
    print(h2.text.strip())

East Counties
East Midlands
London
North East
North West
...    
from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser')

for h2 in soup.find_all('h2')[4:]: # slicing to skip the first 4 elements
    print(h2.text.strip()) # get the inner text of the tag and then strip the white space