Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/search/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用beautifulsoup从genius歌词中获取歌曲歌词│;python 3.8_Python_Html_Python 3.x_Beautifulsoup_Python Requests - Fatal编程技术网

使用beautifulsoup从genius歌词中获取歌曲歌词│;python 3.8

使用beautifulsoup从genius歌词中获取歌曲歌词│;python 3.8,python,html,python-3.x,beautifulsoup,python-requests,Python,Html,Python 3.x,Beautifulsoup,Python Requests,我正在尝试使用beautifulsoup从genius歌词中获取歌词,但在尝试打印歌词时,我没有得到任何输出。这是我的密码: import requests from bs4 import BeautifulSoup songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics") song = songURL.content soup = BeautifulSoup(so

我正在尝试使用beautifulsoup从genius歌词中获取歌词,但在尝试打印歌词时,我没有得到任何输出。这是我的密码:

import requests 
from bs4 import BeautifulSoup
songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics")
song = songURL.content
soup = BeautifulSoup(song, 'lxml')
lyrics = soup.find_all("section")
for lyr in lyrics:
    for lyr1 in lyrics.select("p"):
        print(lyr1.text)      

为什么这不起作用,请有人研究一下,因为我已经尝试了一段时间了。

如果你看一下实际的HTML源代码,没有
部分
标记。以下是结构的实际外观:

<div class="song_body column_layout" initial-content-for="song_body">
  <div class="column_layout-column_span column_layout-column_span--primary">
    <div class="song_body-lyrics">
      
        <h2 class="text_label text_label--gray text_label--x_small_text_size u-top_margin">Happier Lyrics</h2>
      
      <div initial-content-for="lyrics">
        <div class="lyrics">
          
            <!--sse-->
            <p>[Intro]<br>
Lately, I've been, I've been thinking<br>
I want you to be happier, I want you to be happier<br>
<br>
...

快乐的歌词
[简介]
最近,我一直,我一直在思考
我希望你更快乐,我希望你更快乐

...
服务器似乎返回两个版本的页面:在一个版本中,标签上有
class=“song\u body-Lyps”
,在另一个版本中,标签上有
class=“Lyps\u Container…”

此脚本尝试处理两种情况:

import requests 
from bs4 import BeautifulSoup

url = 'https://genius.com/Marshmello-and-bastille-happier-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

for tag in soup.select('div[class^="Lyrics__Container"], .song_body-lyrics p'):
    t = tag.get_text(strip=True, separator='\n')
    if t:
        print(t)
印刷品:

[Intro]
Lately, I've been, I've been thinking
I want you to be happier, I want you to be happier
[Verse 1]

...and so on.

您应该获得特定div中的所有文本。您可以使用浏览器中的
devtools
viewsource
找到该特定div。这里特定的div是
这个div的独特特性是它的类,即类“歌词”,所以我们应该在HTML中找到这个特定的div,然后打印该div中的所有文本

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://alirezaarabi.com/view-source_https___genius.com_Alessia-cara-ready-lyrics.html').read()

soup = bs.BeautifulSoup(source,'lxml')
print(soup.title.string)

for div in soup.find_all('div', class_='lyrics'):
    print(div.text)

那么它选择了哪一个匹配?在song\u body-Lyps类之后的div标记或p标记下以歌词\u容器开头的类?哇,这太强大了。@politicalscientist是的,这是带有逗号的CSS选择器
()。BeautifulSoup支持很好,很方便。
import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://alirezaarabi.com/view-source_https___genius.com_Alessia-cara-ready-lyrics.html').read()

soup = bs.BeautifulSoup(source,'lxml')
print(soup.title.string)

for div in soup.find_all('div', class_='lyrics'):
    print(div.text)