Python 美丽的巴黎不是'；无法获取完整的图像地址_Python_Html_Web_Web Scraping_Beautifulsoup

Python 美丽的巴黎不是'；无法获取完整的图像地址

python html web web-scraping

Python 美丽的巴黎不是'；无法获取完整的图像地址,python,html,web,web-scraping,beautifulsoup,Python,Html,Web,Web Scraping,Beautifulsoup,我使用BeautifulSoup从网站上抓取图像，但是我的代码没有返回查看网页时可见的图像的完整地址 for b in soup.select(".thumb_div.clear a"): imagelink = a["href"].replace("/mushrooms/", "http://www.foragingguide.com/mushrooms/") pri

我使用BeautifulSoup从网站上抓取图像，但是我的代码没有返回查看网页时可见的图像的完整地址

for b in soup.select(".thumb_div.clear a"):
            imagelink = a["href"].replace("/mushrooms/", "http://www.foragingguide.com/mushrooms/")
            print(imagelink)

应返回：

http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg

由于源代码是：


<a href="http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg" rel="lightbox[photos]" title="Amethyst Deceiver (Laccaria amethystina)">

但只返回

http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/

没有jpg文件结尾，这是实现此功能所必需的

有人知道这是为什么吗？

谢谢。

您实际上不必进行替换，只需直接针对图像源即可

例如：

导入请求
从bs4导入BeautifulSoup
终点=”http://www.foragingguide.com/mushrooms/sp/amethyst_deceiver"
响应=请求.get（结束点）.text
soup=BeautifulSoup（响应，“lxml”）。选择（“.thumb\u div a”）
打印（“\n”.join（i[“href”]表示汤中的i））

输出：

http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/87.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/88.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/90.jpg
http://static.foragingguide.com/photos/mushrooms/amethyst_deceiver/91.jpg

简单解决方案”

原来a[“href”]中的“a”与此无关，它是“a”这个不存在的iterable。将代码更改为b[“href”]有效。

为什么需要替换？链接不是绝对的吗？它不会返回绝对链接，只是一个相对路径，因此我使用了一个替换解决方案。很好的解决方案，但是我正在尝试获取常规图像的路径，而不是缩略图，我可以按照相同的过程在a标记中链接常规图像吗？请参阅更新的an回答。

for b in soup.select(".thumb_div a"):
            imagelink = b["href"]
            print(imagelink)