Python 提取元素之间的html文本
所以,我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示:Python 提取元素之间的html文本,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,所以,我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示: <div id="listAlbum"> <a id="19215"></a> <div class="album"> "album: " <b>"3 Feet High And Rising"</b> == $0 " (1989)" </div> <a href="https://www.a
<div id="listAlbum">
<a id="19215"></a>
<div class="album">
"album: "
<b>"3 Feet High And Rising"</b> == $0
" (1989)"
</div>
<a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">Intro
</a>
<br>
<a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">The Magic Number</a>
<br>
<a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">Change In Speak</a>
<br>
我不知道怎样才能得到这些歌曲。我试过这些:
for s in soup(text = re.compile(r'target="_blank">')):
print(s.parent)
有什么想法吗?试试这个。我希望它能为您带来所需的输出:
from bs4 import BeautifulSoup
html_content='''
<div id="listAlbum">
<a id="19215">
</a>
<div class="album">
"album: "
<b>
"3 Feet High And Rising"
</b>
== $0
" (1989)"
</div>
<a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">
Intro
</a>
<br/>
<a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">
The Magic Number
</a>
<br/>
<a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">
Change In Speak
</a>
<br/>
</div>
'''
soup = BeautifulSoup(html_content,"lxml")
for item in soup.select("#listAlbum .album,#listAlbum a"):
print(item.text.strip())
以下是另一种方法:
## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
print(album.get_text())
## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
print(song.get_text())
这很有效。谢谢两个问题:1)您如何知道使用BeautifulSoup(html_内容,“lxml”)?2) 汤怎么样。选择在这里工作吗?第二个参数“#listAlbum a”是指“a href”中的“a”吗?您的第一个问题不清楚。至于你的第二个问题:
soup.select
就像css选择器一样工作。你可能想看看这些文档。至于你的最后一个问题:不,这不是指a。这是ID
的符号,因此它指的是ID=“listAlbum”
。务必将此标记为答案。谢谢
"album: "
"3 Feet High And Rising"
== $0
" (1989)"
Intro
The Magic Number
Change In Speak
## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
print(album.get_text())
## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
print(song.get_text())