Python 如何在Scrapy完成爬网后运行代码
如何在Scrapy完成爬网后运行代码 我有一只蜘蛛:Python 如何在Scrapy完成爬网后运行代码,python,scrapy,Python,Scrapy,如何在Scrapy完成爬网后运行代码 我有一只蜘蛛: from scrapy import Spider from scrapy.spiders import CrawlSpider,Rule from scrapy.linkextractors import LinkExtractor class KpallSpider(CrawlSpider): name = 'test' allowed_domains = ['kupujemprodajem.com'] sta
from scrapy import Spider
from scrapy.spiders import CrawlSpider,Rule
from scrapy.linkextractors import LinkExtractor
class KpallSpider(CrawlSpider):
name = 'test'
allowed_domains = ['kupujemprodajem.com']
start_urls = ['https://www.kupujemprodajem.com/Usluge-Auto-moto/Automehanicar/1410-1426-1-grupa.htm']
rules = [Rule(LinkExtractor(allow=['grupa.htm']),callback='parse_item',follow=True)]
def parse_item(self, response):
url = str(response.url)
yield {'url':url}
我想写JSON
在scrapy完成爬行后,我想:
print('Something')
你可以这样做,在你最喜欢的外壳上:
scrapy crawl test-o items.json&&echo“Something”
或者在python中也可以这样做:
import scrapy
从scrapy.crawler导入crawler进程
类MySpider(scrapy.Spider):
#你的蜘蛛定义
...
进程=爬网进程(设置={
“提要格式”:“json”,
“FEED_URI”:“items.json”
})
进程爬网(MySpider)
process.start()#脚本将在此处阻塞,直到爬网完成
打印(“某物”)