如何使用beautifulsoup和python仅获取mp3链接_Python_Beautifulsoup

如何使用beautifulsoup和python仅获取mp3链接

python

如何使用beautifulsoup和python仅获取mp3链接,python,beautifulsoup,Python,Beautifulsoup,这是我的代码： from bs4 import BeautifulSoup import urllib.request import re url = urllib.request.urlopen("http://www.djmaza.info/Abhi-Toh-Party-Khubsoorat-Full-Song-MP3-2014-Singles.html") content = url.read() soup = BeautifulSoup(content) for a in soup.f

这是我的代码：

from bs4 import BeautifulSoup
import urllib.request
import re

url = urllib.request.urlopen("http://www.djmaza.info/Abhi-Toh-Party-Khubsoorat-Full-Song-MP3-2014-Singles.html")
content = url.read()
soup = BeautifulSoup(content)
for a in soup.findAll('a',href=True):
    if re.findall('http',a['href']):
        print ("URL:", a['href'])

此代码的输出：

URL: http://twitter.com/mp3khan
URL: http://www.facebook.com/pages/MP3KhanCom-Music-Updates/233163530138863
URL: https://plus.google.com/114136514767143493258/posts
URL: http://www.djhungama.com
URL: http://www.djhungama.com
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -190Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://songs.djmazadownload.com/music/Singles/Abhi Toh Party (Khoobsurat) -320Kbps [DJMaza.Info].mp3
URL: http://www.htmlcommentbox.com
URL: http://www.djmaza.com
URL: http://www.djhungama.com

我只需要。mp3链接

那么，我应该如何重写代码呢

谢谢

您可以使用。例如：

if re.findall('http',a['href']) and a['href'].endswith(".mp3"):

将您的

findAll

更改为使用正则表达式进行匹配，例如：

for a in soup.findAll('a',href=re.compile('http.*\.mp3')):
    print ("URL:", a['href'])

有关评论的最新情况：

我需要将这些链接存储在一个数组中以便下载。我该怎么做

您可以使用列表理解来构建列表：

links = [a['href'] for a in soup.find_all('a',href=re.compile('http.*\.mp3'))]

如果您只对扩展名感兴趣，那么您必须知道

endswith（）

返回的是一个布尔值，而不是文件的扩展名。最好为此目的构建自己的函数，如下所示：

if re.findall('http',a['href']) and isMP3file(a['href'])):

现在，您可以这样定义函数：

import os
def isMP3file(link):
    name, ext = os.path.splitext(link)
    return ext.lower() == '.mp3'

谢谢你的回答！。这对我帮助很大。我想翻译这篇文章与我的韩国朋友分享。它将被张贴。如果你介意的话，请告诉我。那我就删除它。非常感谢你…：D@MuneebK不客气。另一方面，当您使用

bs4

时，您可能希望使用

。find_all

而不是

findAll

，因为后者是BS3样式，并且为了向后兼容而保留，但是在某些时候可能会被删除-因此最好养成使用

something\u something

函数而不是

something something

函数的习惯。我需要将这些链接存储在数组中以供下载。我该怎么做？再次感谢。：）对Python和beautifulSoup来说真的很陌生。为大学做一个项目。Linux的大规模下载程序，可帮助下载mp3、jpg等多种文件。。所以你的建议会对我有帮助