Image 使用scrapy从多个URL下载google图像_Image_Url_Scrapy

Image 使用scrapy从多个URL下载google图像

image url scrapy

Image 使用scrapy从多个URL下载google图像,image,url,scrapy,Image,Url,Scrapy,我正在尝试从谷歌图片搜索中的多个URL下载图片但是，我只想从每个url的15个图像 class imageSpider(BaseSpider): name = "image" start_urls = [ 'https://google.com/search?q=simpsons&tbm=isch' 'https://google.com/search?q=futurama&tbm=isch' ] def pa

我正在尝试从谷歌图片搜索中的多个URL下载图片

但是，我只想从每个url的15个图像

class imageSpider(BaseSpider):
    name = "image"
    start_urls = [
        'https://google.com/search?q=simpsons&tbm=isch'
        'https://google.com/search?q=futurama&tbm=isch'
        ]


def parse(self,response):
    hxs = HtmlXPathSelector(response)
    items = []
    images = hxs.select("//div[@id='ires']//div//a[@href]")
    count = 0
    for image in images:
        count += 1
        item = ImageItem()
        image_url = image.select(".//img[@src]")[0].extract()
        import urlparse
        image_absolute_url = urlparse.urljoin(response.url, image_url.strip())
        index = image_absolute_url.index("src")
        changedUrl = image_absolute_url[index+5:len(image_absolute_url)-2]
        item['image_urls'] = [changedUrl]
        index1 = site['url'].index("search?q=")
        index2 = site['url'].index("&tbm=isch")
        imageName = site['url'][index1+9:index2]
        download(changedUrl,imageName + str(count)+".png")
        items.append(item)
        if count == 15:
            break
    return items

下载功能下载图像（我有代码，这不是问题）

问题是，当我中断时，它会在第一个url处停止，而不会继续到下一个url。我如何才能让它下载15个图像的第一个网址，然后15个图像的第二个网址。我之所以使用break，是因为每个谷歌图片页面上都有大约1000张图片，我不想要那么多

问题不在于

break

语句。您在

start\u URL

中遗漏了一个逗号

应该是这样的:

start_urls = [
    'http://google.com/search?q=simpsons&tbm=isch',
    'http://google.com/search?q=futurama&tbm=isch'
]

问题不在于

break

语句。您在

start\u URL

中遗漏了一个逗号

应该是这样的:

start_urls = [
    'http://google.com/search?q=simpsons&tbm=isch',
    'http://google.com/search?q=futurama&tbm=isch'
]

ps：您可以在python中使用，

来表示图像中的图像[：15]：

而不是

来表示。。。break…

.ps：在python中，可以使用

来表示图像中的图像[：15]：

而不是

来表示。。。中断…

。