Python 2.7 在scrapy中完成所有爬行之后,如何执行该功能?
spider_closed()函数未执行。如果我只给出print语句,它正在打印,但如果我执行任何函数调用并返回值,它将不工作Python 2.7 在scrapy中完成所有爬行之后,如何执行该功能?,python-2.7,web-scraping,scrapy,screen-scraping,scrapy-spider,Python 2.7,Web Scraping,Scrapy,Screen Scraping,Scrapy Spider,spider_closed()函数未执行。如果我只给出print语句,它正在打印,但如果我执行任何函数调用并返回值,它将不工作 import scrapy import re from pydispatch import dispatcher from scrapy import signals from SouthShore.items import Product from SouthShore.internalData import internalApi from scrapy.htt
import scrapy
import re
from pydispatch import dispatcher
from scrapy import signals
from SouthShore.items import Product
from SouthShore.internalData import internalApi
from scrapy.http import Request
class bestbuycaspider(scrapy.Spider):
name = "bestbuy_dca"
allowed_domains = ["bestbuy.ca"]
start_urls = ["http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+beds",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+night+stand",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+headboard",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+desk",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+bookcase",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+dresser",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+tv+stand",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+armoire",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+kids",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+changing+table",
"http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+baby"]
def __init__(self,jsondetails="",serverdetails="", *args,**kwargs):
super(bestbuycaspider, self).__init__(*args, **kwargs)
dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
self.jsondetails = jsondetails
self.serverdetails=serverdetails
self.data = []
def parse(self,response):
#my stuff here
def spider_closed(self,spider):
print "returning values"
self.results['extractedData']=self.data
print self.results=internalApi(self.jsondetails,self.serverdetails)
yield self.results
1) 我想调用一些函数并返回刮取的值您可以使用close\u spider()方法创建一个:
class MyPipeline(object):
def close_spider(self, spider):
do_something_here()
只是别忘了在settings.py中激活它,如上面文档摘要链接中所述。那么您想继续在spider\u closed中爬行吗?是否生成项目或请求?不,我想在spider关闭后返回已爬网的项目,并在另一个py文件中调用另一个函数,这样它将执行一些操作并给出一些值。我需要追加并返回我的爬网值和调用函数output的值。零碎项不存储在内存中,它们在调用yield item
时输出。如果你想在输出时处理每一个项目,你必须使用管道,但是一旦爬行器结束,就使用所有这些项目是一种非常糟糕的做法(因为你必须自己存储它们)抱歉,我是scrapy的新手,我是否需要在pipelines.py文件中创建Pipeline类和closs_spider函数,或者我是否可以更改spider文件本身中的类名。如果我需要在pipelines.py文件中创建类和函数,因此,我的疑问是1)我如何将管道类导入到我的spider文件中,或者将自动执行?2) 如何将已爬网的值传递给pipelines.y文件中的close_spider函数。