Scrapy 如何在Scrasty中手动执行请求对象？_Scrapy

Scrapy 如何在Scrasty中手动执行请求对象？

scrapy

Scrapy 如何在Scrasty中手动执行请求对象？,scrapy,Scrapy,我正在尝试使用scrapy下载一个只支持HTML的网站。我使用爬行蜘蛛类来实现这一点。下面是我的解析器的样子。我的爬虫下载网页的HTML源代码，并制作网站的本地镜像。它成功地镜像了网站，但没有图像。要下载附加到每个页面的图像，我尝试添加： def parse_link(self, response): # Download the source of the page # CODE HERE # Now search for images

我正在尝试使用scrapy下载一个只支持HTML的网站。我使用爬行蜘蛛类来实现这一点。下面是我的解析器的样子。我的爬虫下载网页的HTML源代码，并制作网站的本地镜像。它成功地镜像了网站，但没有图像。要下载附加到每个页面的图像，我尝试添加：

def parse_link(self, response):
        # Download the source of the page

        # CODE HERE

        # Now search for images

        x = HtmlXPathSelector(response)
        imgs = x.select('//img/@src').extract()

        # Download images

        for i in imgs:
            r = Request(urljoin(response.url, i), callback=self.parse_link)
            # execute the request here

在中的示例中，解析器似乎返回请求对象，然后执行get

是否有办法手动执行请求以获得响应？我需要在每次parse_link调用中执行多个请求。

您可以使用管道下载图像。或者，如果要手动执行请求，请使用

yield

：

def parse_link(self, response):
    """Download the source of the page"""

    # CODE HERE

    item = my_loader.load_item()

    # Now search for images

    imgs = HtmlXPathSelector(response).select('//img/@src').extract()

    # Download images

    path = '/local/path/to/where/i/want/the/images/'
    item['path'] = path

    for i in imgs:
        image_src = i[0]
        item['images'].append(image_src)
        yield Request(urljoin(response.url, image_src),
                callback=self.parse_images,
                meta=dict(path=path))

    yield item

def parse_images(self, response):
    """Save images to disk"""

    path = response.meta.get('path')

    n = get_the_filename(response.url)
    f = open(path + n, 'wb')
    f.write(response.body)