Python Scrapy也将图像数组保存在json文件中，而不仅仅是url_Python_Scrapy_Web Crawler

Python Scrapy也将图像数组保存在json文件中，而不仅仅是url

python scrapy web-crawler

Python Scrapy也将图像数组保存在json文件中，而不仅仅是url,python,scrapy,web-crawler,Python,Scrapy,Web Crawler,在学习了如何使用scrapy正确下载图像之后，我现在正试图生成一个只包含图像URL的干净json文件，但是scrapy还保存了一个空的图像数组，我目前并不关心这个数组 def parse(self, response): raw_image_urls = response.xpath(".//img/@src").getall() clean_image_urls = [] for img_url in raw_image_ur

在学习了如何使用scrapy正确下载图像之后，我现在正试图生成一个只包含图像URL的干净json文件，但是scrapy还保存了一个空的图像数组，我目前并不关心这个数组

def parse(self, response):
        raw_image_urls = response.xpath(".//img/@src").getall()
        clean_image_urls = []
        for img_url in raw_image_urls:
            clean_image_urls.append(response.urljoin(img_url))
        for clear_url in clean_image_urls:
            yield {
                'image_url': clear_url,  
            }

这将产生：

{"image_url": "https://image.shutterstock.com/image-photo/deep-forest-river-wild-waterfall-260nw-1585363855.jpg", "images": []},

而不仅仅是：

{"image_url": "https://image.shutterstock.com/image-photo/deep-forest-river-wild-waterfall-260nw-1585363855.jpg"},

这正是我需要的

我对管道进行了如下修改：

class customImagePipeline(ImagesPipeline):
    def file_path(self, request, response=None, info=None):
        return request.url.split('/')[-1]

这应该给我正确的图像名称。

设置该字段，因此您需要覆盖它，使其不起任何作用：

def项目_已完成（自身、结果、项目、信息）：
退货项目

您的问题到底是什么？您提供的解决方案有效吗？