Python 提取元素之间的html文本

Python 提取元素之间的html文本,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,所以,我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示: <div id="listAlbum"> <a id="19215"></a> <div class="album"> "album: " <b>"3 Feet High And Rising"</b> == $0 " (1989)" </div> <a href="https://www.a

所以,我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示:

<div id="listAlbum"> 
    <a id="19215"></a>
    <div class="album">
    "album: "
    <b>"3 Feet High And Rising"</b> == $0
    " (1989)"
  </div> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">Intro
  </a>
  <br> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">The Magic Number</a>
  <br> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">Change In Speak</a>
  <br> 
我不知道怎样才能得到这些歌曲。我试过这些:

for s in soup(text = re.compile(r'target="_blank">')):
    print(s.parent)

有什么想法吗?

试试这个。我希望它能为您带来所需的输出:

from bs4 import BeautifulSoup

html_content='''
  <div id="listAlbum">
   <a id="19215">
   </a>
   <div class="album">
    "album: "
    <b>
     "3 Feet High And Rising"
    </b>
    == $0
    " (1989)"
   </div>
   <a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">
    Intro
   </a>
   <br/>
   <a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">
    The Magic Number
   </a>
   <br/>
   <a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">
    Change In Speak
   </a>
   <br/>
  </div>
'''
soup = BeautifulSoup(html_content,"lxml")
for item in soup.select("#listAlbum .album,#listAlbum a"):
    print(item.text.strip())

以下是另一种方法:

## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
    print(album.get_text())

## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
    print(song.get_text())

这很有效。谢谢两个问题:1)您如何知道使用BeautifulSoup(html_内容,“lxml”)?2) 汤怎么样。选择在这里工作吗?第二个参数“#listAlbum a”是指“a href”中的“a”吗?您的第一个问题不清楚。至于你的第二个问题:
soup.select
就像css选择器一样工作。你可能想看看这些文档。至于你的最后一个问题:不,这不是指a。这是
ID
的符号,因此它指的是
ID=“listAlbum”
。务必将此标记为答案。谢谢
"album: "
"3 Feet High And Rising"   
== $0
" (1989)"

Intro
The Magic Number
Change In Speak
## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
    print(album.get_text())

## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
    print(song.get_text())