Image Scrapy：通过重命名defualt图像名称自定义图像管道_Image_Python Imaging Library_Scrapy

Image Scrapy：通过重命名defualt图像名称自定义图像管道

image scrapy

Image Scrapy：通过重命名defualt图像名称自定义图像管道,image,python-imaging-library,scrapy,Image,Python Imaging Library,Scrapy,我正在使用图像管道下载来自不同网站的所有图像所有图像都已成功下载到我定义的文件夹中，但在保存到硬盘之前，我无法命名我选择的下载图像这是我的密码管道.py Image_spider.py def getImage(self, response): item = JellyfishItem() item['image_urls']= [response.url] item['image_name']= response.meta['image_name'] return item

我正在使用图像管道下载来自不同网站的所有图像

所有图像都已成功下载到我定义的文件夹中，但在保存到硬盘之前，我无法命名我选择的下载图像

这是我的密码

管道.py

Image_spider.py

 def getImage(self, response):
 item = JellyfishItem()
 item['image_urls']= [response.url]
 item['image_name']= response.meta['image_name']
 return item

我需要在代码中做哪些更改

更新1

管道.py

class jellyImagesPipeline(ImagesPipeline):

    def image_custom_key(self, response):
        print '\n\n image_custom_key \n\n'
        name = response.meta['image_name'][0]
        img_key = 'full/%s.jpg' % (name)
        print "custom image key:", img_key
        return img_key
        
    def get_images(self, response, request, info):
        print "\n\n get_images \n\n"
        for key, image, buf, in super(jellyImagesPipeline, self).get_images(response, request, info):
            yield key, image, buf

        
        key = self.image_custom_key(response)
        orig_image = Image.open(StringIO(response.body))
        image, buf = self.convert_image(orig_image)
        yield key, image, buf
   
    def get_media_requests(self, item, info):
        print "\n\nget_media_requests\n"
        return [Request(x, meta={'image_name': item["image_name"]})
                for x in item.get('image_urls', [])]

更新2

在

pipelines.py中

from scrapy.contrib.pipeline.images import ImagesPipeline
from scrapy.http import Request
from PIL import Image
from cStringIO import StringIO
import re

class jellyImagesPipeline(ImagesPipeline):

    CONVERTED_ORIGINAL = re.compile('^full/[0-9,a-f]+.jpg$')

    # name information coming from the spider, in each item
    # add this information to Requests() for individual images downloads
    # through "meta" dictionary
    def get_media_requests(self, item, info):
        print "get_media_requests"
        return [Request(x, meta={'image_name': item["image_name"]})
                for x in item.get('image_urls', [])]

    # this is where the image is extracted from the HTTP response
    def get_images(self, response, request, info):
        print "get_images"

        for key, image, buf, in super(jellyImagesPipeline, self).get_images(response, request, info):
            if self.CONVERTED_ORIGINAL.match(key):
                key = self.change_filename(key, response)
            yield key, image, buf

    def change_filename(self, key, response):
        return "full/%s.jpg" % response.meta['image_name'][0]

在

settings.py

中，确保

ITEM_PIPELINES = ['jelly.pipelines.jellyImagesPipeline']
IMAGES_STORE = '/path/to/where/you/want/to/store/images'

蜘蛛网示例：从Python.org的主页获取图像，保存图像的名称（和路径）将遵循网站结构，即位于名为

www.Python.org

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item, Field
import urlparse

class CustomItem(Item):
    image_urls = Field()
    image_names = Field()
    images = Field()

class ImageSpider(BaseSpider):
    name = "customimg"
    allowed_domains = ["www.python.org"]
    start_urls = ['http://www.python.org']

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//img')
        items = []
        for site in sites:
            item = CustomItem()
            item['image_urls'] = [urlparse.urljoin(response.url, u) for u in site.select('@src').extract()]
            # the name information for your image
            item['image_name'] = ['whatever_you_want']
            items.append(item)
        return items

有什么回应。meta['image_name']？它是否仅依赖于URL？或者可能

@alt或@title？

response.meta['image\u name']

是从Mysql表中检索的，它不依赖于url。它完全独立于URL，有可能实现更简单的解决方案，请参见.Thanx以回答我的问题，但它无法解决我的问题。图像名称仍然没有更改。请帮我做些挖掘。我用测试过的代码编辑了我的答案。您有两个选项：使用新名称创建另一个图像，或者从内置的

ImagesPipeline

Thanx更改原始JPEG转换图像的名称，以获得答案@paul。我真的很感谢你的努力。我想选择您建议的第二个选项，即更改内置ImagePipeline中原始JPEG转换图像的名称。我已根据您的建议更新了有问题的代码，但程序根本没有进入

get_image

和

image___key

。因此，图像下载时不会更改名称。您是否在settings.py文件中设置了

ITEM_PIPELINES=['yourprojectname.PIPELINES.jellymagespipeline']

？另外，确保管道名称一致：在编辑的问题代码中，

jellymagespipeline

以“j”开头，然后是“j”（在

super（）

调用中）

ITEM_PIPELINES = ['jelly.pipelines.jellyImagesPipeline']
IMAGES_STORE = '/path/to/where/you/want/to/store/images'

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item, Field
import urlparse

class CustomItem(Item):
    image_urls = Field()
    image_names = Field()
    images = Field()

class ImageSpider(BaseSpider):
    name = "customimg"
    allowed_domains = ["www.python.org"]
    start_urls = ['http://www.python.org']

    def parse(self, response):
        hxs = HtmlXPathSelector(response)
        sites = hxs.select('//img')
        items = []
        for site in sites:
            item = CustomItem()
            item['image_urls'] = [urlparse.urljoin(response.url, u) for u in site.select('@src').extract()]
            # the name information for your image
            item['image_name'] = ['whatever_you_want']
            items.append(item)
        return items