Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/cocoa/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用python解析相对链接和绝对链接_Python_Html_Python 3.x_Beautifulsoup_Html Parsing - Fatal编程技术网

使用python解析相对链接和绝对链接

使用python解析相对链接和绝对链接,python,html,python-3.x,beautifulsoup,html-parsing,Python,Html,Python 3.x,Beautifulsoup,Html Parsing,这是一个下载图像、音频、视频等的项目。 但在一些网站上,我发现没有完整的链接。只是相对路径。 所以我不知道如何获得这些相关链接 我的整个项目在: https://github.com/MuneebKalathil/MaD 这是我的示例链接,我想从这个链接下载所有图像。有缩略图像,但我不想要那个图像。如果单击缩略图,它将转到原始图像页面。我想下载这些图片 http://www.ragalahari.com/actress/14035/kajal-aggarwal-at-memu-saitham-

这是一个下载图像、音频、视频等的项目。 但在一些网站上,我发现没有完整的链接。只是相对路径。 所以我不知道如何获得这些相关链接

我的整个项目在:

https://github.com/MuneebKalathil/MaD
这是我的示例链接,我想从这个链接下载所有图像。有缩略图像,但我不想要那个图像。如果单击缩略图,它将转到原始图像页面。我想下载这些图片

http://www.ragalahari.com/actress/14035/kajal-aggarwal-at-memu-saitham-dinner-with-stars.aspx
部分来源是:

<tr>
<td id='pagingCell'>
</td>
</tr>
<tr>
<td align='center'><div id='galdiv' style='float:center;margin-right:3px;;margin-bottom:3px'>
<a href='/actress/14035/kajal-aggarwal-at-memu-saitham-dinner-with-stars/image1.aspx' ><img src="http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham1t.jpg" alt="Kajal Aggarwal" title="Kajal Aggarwal at Dine with Stars Memu Saitham"></a>
找到它的绝对路径。

使用。传递页面的URL作为其第一个参数。作为第二个参数,传递
href
或其他可能的相对URL。它将正确处理绝对和相对URL,将它们解析为最终的绝对URL


如果您仍在使用Python 2,
urljoin
位于
urlparse
模块中。

定义基本url,查找所有
img
标记,如果
src
属性值不是以
http
开头,则用于连接基本url和
src

例如,使用和:

印刷品:

http://icdn.raagalahari.com/images/ragalaharilogo.png
http://www.ragalahari.com/images/helpicon.png
http://www.ragalahari.com/images/rssicon.png
http://www.ragalahari.com/images/twittericon.png
http://www.ragalahari.com/images/facebookicon.png
http://www.ragalahari.com/images/searchicon.png
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham1t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham2t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham3t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham4t.jpg
...
更新(获取
a
链接的部分代码):


你能用python3写代码吗。。获取一些错误:(此程序仅获取图像链接。我希望href链接靠近src链接too@MuneebK你是说普通的
a
链接吗?@alexce,我想要的是…的绝对链接。我编辑了这个问题:…)。。python新手:(正在工作…:D…如果我有更多疑问,可以联系你吗?。有fb、gmail或yahoo吗?
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup

base_url = 'http://www.ragalahari.com'
url = 'http://www.ragalahari.com/actress/14035/kajal-aggarwal-at-memu-saitham-dinner-with-stars.aspx'

soup = BeautifulSoup(requests.get(url).content)

for img in soup.find_all('img', src=True):
    src = img.get('src')
    if not src.startswith('http'):
        src = urljoin(base_url, src)

    print(src)
http://icdn.raagalahari.com/images/ragalaharilogo.png
http://www.ragalahari.com/images/helpicon.png
http://www.ragalahari.com/images/rssicon.png
http://www.ragalahari.com/images/twittericon.png
http://www.ragalahari.com/images/facebookicon.png
http://www.ragalahari.com/images/searchicon.png
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham1t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham2t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham3t.jpg
http://imgcdn.raagalahari.com/nov2014/starzone/kajal-agarwal-memu-saitham/kajal-agarwal-memu-saitham4t.jpg
...
for a in soup.select('div#galdiv a'):
    link = a.get('href')
    if not link.startswith('http'):
        link = urljoin(base_url, link)

    print(link)