为什么赢了'；在其他图像类型下使用Python脚本下载t.png文件？_Python_Image_File_Png

为什么赢了'；在其他图像类型下使用Python脚本下载t.png文件？

python image file

为什么赢了'；在其他图像类型下使用Python脚本下载t.png文件？,python,image,file,png,Python,Image,File,Png,我试图搜索网页的源代码，并使用Python从中下载各种文件。该脚本在源代码中搜索.jpg文件，并按预期下载它们。但是，在修改脚本时（将“.jpg”更改为“.png”，如下所示），我得到了错误： Traceback (most recent call last): File "img.py", line 19, in <module> urllib.urlretrieve(images[z], "image"+str(z)+".png") File "/System/Library/F

我试图搜索网页的源代码，并使用Python从中下载各种文件。该脚本在源代码中搜索.jpg文件，并按预期下载它们。但是，在修改脚本时（将“.jpg”更改为“.png”，如下所示），我得到了错误：

Traceback (most recent call last):
File "img.py", line 19, in <module> urllib.urlretrieve(images[z], "image"+str(z)+".png")
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 91, in urlretrieve
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 237, in retrieve
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 205, in open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 461, in open_file
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 475, in open_local_file
IOError: [Errno 2] No such file or directory: '/images/adapt-icon-search.png?1342791397'

回溯（最近一次呼叫最后一次）：
文件“img.py”，第19行，在urllib.urlretrieve（images[z]，“image”+str（z）+“.png”）中
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py”，urlretrieve中的第91行
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py”，检索中的第237行
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py”，第205行，打开
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py”，第461行，在open_文件中
文件“/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py”，第475行，在open_local_文件中
IOError:[Errno 2]没有这样的文件或目录：'/images/adapt icon search.png？1342791397'

以下是我正在使用的脚本：

import urllib
import urllib2
import re

print "enter url of site (such as 'dribbble.com')"

url = raw_input()
fullurl = "http://"+url

src = urllib2.urlopen(fullurl)
src = src.read()

images = re.findall('src="(.*\.png[^"]*)', src)

z=0
while z < len(images):
    urllib.urlretrieve(images[z], "image"+str(z)+".png")
    print "done"
    z+=1

导入urllib
导入urllib2
进口稀土
打印“输入网站的url（如'dribble.com'）”
url=原始输入（）
fullurl=“http://”+url
src=urllib2.urlopen（fullurl）
src=src.read（）
images=re.findall（'src=“（.\.png[^“]*）”，src）
z=0
而z


如果您能深入了解此脚本为什么不适用于.png文件，我们将不胜感激。非常感谢
更新：以下是我想搜索的源代码示例：
<span rel="tipsy" title="This shot has rebounds." class="rebound-mark has-rebounds">1</span>
                </a>            
        </div>
    </div>
    <h2>
        <a href="/Dash" class="url" rel="contact" title="Dash"><img alt="Avatar-new" class="photo fn" src="http://dribbble.s3.amazonaws.com/users/107759/avatars/original/avatar-new.png?1339961321" /> Dash</a>
        <a href="/account/pro" class="badge-link">
    <span class="badge badge-pro">Pro</span>
</a>
    </h2>

1
因此，您得到的错误是：
IOError:[Errno 2]没有这样的文件或目录：'/images/adapt icon search.png？1342791397'
发生的情况是，您正在抓取的网页中有一些PNG引用，这些引用不包括URL中包含的域名。当您尝试在while
循环中获取它们时，它会失败，因为您只提供远程主机上的位置：/images/adapt icon search.png？1342791397

您需要扩展代码来检测这些类型的URL（这是完全合法的，事实上，非常常见）。对于您正在点击的类型，您只需要在匹配的URL前面加上服务器的主机名（例如http://dribble.com/
）
您可能还希望处理相对URL，这也排除了主机名，但开始时没有/
字符。如果有上一页的路径，则需要在它们前面加上上上一页的路径。因此，如果您正在刮取http://dribble.com/foo/bar.html
，您需要在相对URL前面加上http://dribble.com/foo/

可能有一个库可以为您自动处理非绝对URL，这可能是web抓取过程的一部分。恐怕我对网络抓取的第一手知识不太了解，但也许其他人可以在评论中推荐一个。
你能发布更长的回溯吗？当然！我刚刚更新了我的问题，以包含完整的回溯。在regex中排除的内容中添加一个？
：images=re.findall（'src=“（.*\.png[^”]*）”，src）
我认为您的regex查找的不仅仅是文件名-因此出现了错误（即无法找到“/images/adapt icon search.png？1342791397”）@inspectorG4dget我按照指示添加了“？”，但它返回了相同的回溯。这是正确的诊断urlparse.urljoin
在这里很方便。