Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python scrapy起始URL_Python_Scrapy - Fatal编程技术网

Python scrapy起始URL

Python scrapy起始URL,python,scrapy,Python,Scrapy,是否可以使用下面这样的多个url执行以下操作? 每个链接将有大约50页进行爬网和循环。当前解决方案正在运行,但仅当我使用1个URL而不是多个URL时才起作用 start_urls = [ 'https://www.xxxxxxx.com.au/home-garden/page-%s/c18397' % page for page in range(1, 50), 'https://www.xxxxxxx.com.au/automotive/page-%s/c21159' % page for

是否可以使用下面这样的多个url执行以下操作? 每个链接将有大约50页进行爬网和循环。当前解决方案正在运行,但仅当我使用1个URL而不是多个URL时才起作用

 start_urls = [

'https://www.xxxxxxx.com.au/home-garden/page-%s/c18397' % page for page in range(1, 50),
'https://www.xxxxxxx.com.au/automotive/page-%s/c21159' % page for page in range(1, 50),
'https://www.xxxxxxx.com.au/garden/page-%s/c25449' % page for page in range(1, 50),
 ]

我们可以使用另一个列表来执行该操作。我在下面分享了它的代码。希望这就是你要找的

final_urls=[]
start_urls = [
'https://www.xxxxxxx.com.au/home-garden/page-%s/c18397',
'https://www.xxxxxxx.com.au/automotive/page-%s/c21159',
'https://www.xxxxxxx.com.au/garden/page-%s/c25449']
final_urls.extend(url % page for page in range(1, 50) for url in start_urls)
输出片段
final_urls[1:20]


 ['https://www.xxxxxxx.com.au/automotive/page-1/c21159',
 'https://www.xxxxxxx.com.au/garden/page-1/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-2/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-2/c21159',
 'https://www.xxxxxxx.com.au/garden/page-2/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-3/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-3/c21159',
 'https://www.xxxxxxx.com.au/garden/page-3/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-4/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-4/c21159',
 'https://www.xxxxxxx.com.au/garden/page-4/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-5/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-5/c21159',
 'https://www.xxxxxxx.com.au/garden/page-5/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-6/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-6/c21159',
 'https://www.xxxxxxx.com.au/garden/page-6/c25449',
 'https://www.xxxxxxx.com.au/home-garden/page-7/c18397',
 'https://www.xxxxxxx.com.au/automotive/page-7/c21159']
关于你最近的询问,你试过这个吗

def parse(self, response):

    for link in final_urls:
        request = scrapy.Request(link)
        yield request

我建议对此使用
start\u请求

def start_requests(self):
    base_urls = [

        'https://www.xxxxxxx.com.au/home-garden/page-{page_number}/c18397',
        'https://www.xxxxxxx.com.au/automotive/page-{page_number}/c21159',
        'https://www.xxxxxxx.com.au/garden/page-{page_number}/c25449',
    ]

    for page in range(1, 50):
        for base_url in base_urls:
            url = base_url.format( page_number=page )
            yield scrapy.Request( url, callback=self.parse )

非常感谢你!但是如何开始解析它呢?final_url.extend(url%页面,范围(1,50)内的页面,开始_url中的url)循环通过final_url并请求和处理每个url。这就是我所拥有的,我如何触发它运行这个:def parse(self,response):sel=Selector(response)for link in sel.xpath(“/*[包含(@href,/s-ad/”)])”):ad_link=link.css('a::attr(href')。extract_first()absolute_url=self.base_url+ad_link产生响应。follow(absolute_url,self.parse_each_ad)请检查这个答案,我也认为这就是你要找的东西。很抱歉,我看不到与此有任何关系?