Python:在解析html代码时跳过行并去掉空格
我有以下Python:在解析html代码时跳过行并去掉空格,python,html,string,beautifulsoup,html-parsing,Python,Html,String,Beautifulsoup,Html Parsing,我有以下html代码: html_doc = """ <h2> API guidance for developers</h2> <h2>Images</h2> <h2>Score descriptors</h2> <h2>Downloadable XML data files (updated daily)</h2> <h2>
html
代码:
html_doc = """
<h2> API guidance for developers</h2>
<h2>Images</h2>
<h2>Score descriptors</h2>
<h2>Downloadable XML data files (updated daily)</h2>
<h2>
East Counties</h2>
<h2>
East Midlands</h2>
<h2>
London</h2>
<h2>
North East</h2>
<h2>
North West</h2>
<h2>
South East</h2>
<h2>
South West</h2>
<h2>
West Midlands</h2>
<h2>
Yorkshire and Humberside</h2>
<h2>
Northern Ireland</h2>
<h2>
Scotland</h2>
<h2>
Wales</h2>
"""
预期结果:
East Counties
East Midlands
London
North East
...
我做错了什么?您可以在这里使用
切片
,因为find\u all
返回一个列表类型,这样您就可以处理它的索引,如[4::
和忽略空白使用strip()
您可以在这里使用
slicing
,因为find\u all
返回一个列表类型,这样您就可以处理它的索引,如[4:
和忽略空白使用strip()
为什么你不能直接使用
soup.find_all('h2')[4://code>?为什么你不能直接使用soup.find_all('h2')[4://code>?
East Counties
East Midlands
London
North East
...
for h2 in soup.find_all('h2')[4:]:
print(h2.text.strip())
East Counties
East Midlands
London
North East
North West
...
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
for h2 in soup.find_all('h2')[4:]: # slicing to skip the first 4 elements
print(h2.text.strip()) # get the inner text of the tag and then strip the white space