Python 提取元素之间的html文本_Python_Python 3.x_Beautifulsoup

Python 提取元素之间的html文本

python python-3.x

Python 提取元素之间的html文本,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,所以，我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示： <div id="listAlbum"> <a id="19215"></a> <div class="album"> "album: " <b>"3 Feet High And Rising"</b> == $0 " (1989)" </div> <a href="https://www.a

所以，我想用靓汤从这个网站上搜刮专辑和歌曲。HTML如下所示：

<div id="listAlbum"> 
    <a id="19215"></a>
    <div class="album">
    "album: "
    <b>"3 Feet High And Rising"</b> == $0
    " (1989)"
  </div> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">Intro
  </a>
  <br> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">The Magic Number</a>
  <br> 
  <a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">Change In Speak</a>
  <br>

我不知道怎样才能得到这些歌曲。我试过这些：

for s in soup(text = re.compile(r'target="_blank">')):
    print(s.parent)

有什么想法吗？

试试这个。我希望它能为您带来所需的输出：

from bs4 import BeautifulSoup

html_content='''
  <div id="listAlbum">
   <a id="19215">
   </a>
   <div class="album">
    "album: "
    <b>
     "3 Feet High And Rising"
    </b>
    == $0
    " (1989)"
   </div>
   <a href="https://www.azlyrics.com/lyrics/delasoul/intro.html" target="_blank">
    Intro
   </a>
   <br/>
   <a href="https://www.azlyrics.com/lyrics/delasoul/themagicnumber.html" target="_blank">
    The Magic Number
   </a>
   <br/>
   <a href="https://www.azlyrics.com/lyrics/delasoul/changeinspeak.html" target="_blank">
    Change In Speak
   </a>
   <br/>
  </div>
'''
soup = BeautifulSoup(html_content,"lxml")
for item in soup.select("#listAlbum .album,#listAlbum a"):
    print(item.text.strip())

以下是另一种方法：

## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
    print(album.get_text())

## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
    print(song.get_text())

这很有效。谢谢两个问题：1）您如何知道使用BeautifulSoup（html_内容，“lxml”）？2）汤怎么样。选择在这里工作吗？第二个参数“#listAlbum a”是指“a href”中的“a”吗？您的第一个问题不清楚。至于你的第二个问题：

soup.select

就像css选择器一样工作。你可能想看看这些文档。至于你的最后一个问题：不，这不是指a。这是

ID

的符号，因此它指的是

ID=“listAlbum”

。务必将此标记为答案。谢谢

"album: "
"3 Feet High And Rising"   
== $0
" (1989)"

Intro
The Magic Number
Change In Speak

## Prints every album
albums = soup.find_all(class_="album")
for album in albums:
    print(album.get_text())

## Prints every song
songs = soup.find_all('a', target="_blank")
for song in songs:
    print(song.get_text())