Python 使用BeautifulSoup获取跨度之间的文本
我正在尝试使用Python中的BeautifulSoup来清理各种站点。假设我有以下Python 使用BeautifulSoup获取跨度之间的文本,python,beautifulsoup,lxml,Python,Beautifulsoup,Lxml,我正在尝试使用Python中的BeautifulSoup来清理各种站点。假设我有以下html摘录: <div class="member_biography"> <h3>Biography</h3> <span class="sub_heading">District:</span> AnyState - At Large<br/> <span class="sub_heading">Political High
html
摘录:
<div class="member_biography">
<h3>Biography</h3>
<span class="sub_heading">District:</span> AnyState - At Large<br/>
<span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/>
<span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/>
<span class="sub_heading">Residence:</span> Some Town<br/>
<span class="sub_heading">Religion:</span> Episcopalian<br/>
<span class="sub_heading">Family:</span> Wife, Some Name; two children<br/>
<span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/>
<span class="sub_heading">Elected:</span> 19XX<br/>
</div>
但是,到目前为止,我只能做到以下几点:
District:
Political Highlights:
Born:
Residence:
Religion:
Family:
Education:
Elected:
使用以下代码:
import urllib.request
import sys
from bs4 import BeautifulSoup
def main(url):
fp = urllib.request.urlopen(url)
site_bytearray = fp.read()
fp.close()
#bs_data = BeautifulSoup(site_str,features="html.parser")
bs_data = BeautifulSoup(site_bytearray,'lxml')
tmplist = bs_data.find_all('span',{'class':'sub_heading'})
for item in tmplist:
print(item.text)
sys.exit(0)
if __name__ == "__main__":
main(sys.argv[1])
简言之,如何从
地区:任何州-一般
提取地区
和任何州
,并将结果累积到列表中以供进一步处理?替换打印命令:
Python 3.6+:
print(f'{item.text:<25} {item.next_sibling}')
替换以下内容的打印命令: Python 3.6+:
print(f'{item.text:<25} {item.next_sibling}')
你是否尝试过使用
getText()
似乎对我总是有效。你是否尝试过使用getText()
似乎对我总是有效。你能给我指出一些解释这一点的方法吗?例如:什么是{:Nvm,我猜出来了。谢谢:你能给我指一些解释这个的东西吗?例如:什么是{:Nvm,我猜出来了。谢谢:
print('{:<25} {}'.format(item.text, item.next_sibling))
District: AnyState - At Large
Political Highlights: AnyTown City Council, 19XX-XX
Born: June X, 19XX; AnyTown, Calif.
Residence: Some Town
Religion: Episcopalian
Family: Wife, Some Name; two children
Education: Some State College, A.A. 19XX; Some Other State College, B.A. 19XX
Elected: 19XX