Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/python-2.7/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 在scrapy中完成所有爬行之后,如何执行该功能?_Python 2.7_Web Scraping_Scrapy_Screen Scraping_Scrapy Spider - Fatal编程技术网

Python 2.7 在scrapy中完成所有爬行之后,如何执行该功能?

Python 2.7 在scrapy中完成所有爬行之后,如何执行该功能?,python-2.7,web-scraping,scrapy,screen-scraping,scrapy-spider,Python 2.7,Web Scraping,Scrapy,Screen Scraping,Scrapy Spider,spider_closed()函数未执行。如果我只给出print语句,它正在打印,但如果我执行任何函数调用并返回值,它将不工作 import scrapy import re from pydispatch import dispatcher from scrapy import signals from SouthShore.items import Product from SouthShore.internalData import internalApi from scrapy.htt

spider_closed()函数未执行。如果我只给出print语句,它正在打印,但如果我执行任何函数调用并返回值,它将不工作

import scrapy
import re
from pydispatch import dispatcher
from scrapy import signals

from SouthShore.items import Product
from SouthShore.internalData import internalApi
from scrapy.http import Request

class bestbuycaspider(scrapy.Spider):
    name = "bestbuy_dca"

    allowed_domains = ["bestbuy.ca"]

    start_urls = ["http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+beds",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+night+stand",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+headboard",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+desk",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+bookcase",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+dresser",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+tv+stand",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+armoire",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+kids",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+changing+table",
              "http://www.bestbuy.ca/Search/SearchResults.aspx?type=product&page=1&sortBy=relevance&sortDir=desc&pageSize=96&query=south+shore+furniture+baby"]

    def __init__(self,jsondetails="",serverdetails="", *args,**kwargs):
        super(bestbuycaspider, self).__init__(*args, **kwargs)
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
        self.jsondetails = jsondetails
        self.serverdetails=serverdetails
        self.data = []

    def parse(self,response):
        #my stuff here 



    def spider_closed(self,spider):
        print "returning values"
        self.results['extractedData']=self.data
        print self.results=internalApi(self.jsondetails,self.serverdetails)
        yield self.results
1) 我想调用一些函数并返回刮取的值

您可以使用
close\u spider()方法创建一个:

class MyPipeline(object):
    def close_spider(self, spider):
        do_something_here()

只是别忘了在settings.py中激活它,如上面文档摘要链接中所述。

那么您想继续在
spider\u closed中爬行吗?是否生成项目或请求?不,我想在spider关闭后返回已爬网的项目,并在另一个py文件中调用另一个函数,这样它将执行一些操作并给出一些值。我需要追加并返回我的爬网值和调用函数output的值。零碎项不存储在内存中,它们在调用
yield item
时输出。如果你想在输出时处理每一个项目,你必须使用管道,但是一旦爬行器结束,就使用所有这些项目是一种非常糟糕的做法(因为你必须自己存储它们)抱歉,我是scrapy的新手,我是否需要在pipelines.py文件中创建Pipeline类和closs_spider函数,或者我是否可以更改spider文件本身中的类名。如果我需要在pipelines.py文件中创建类和函数,因此,我的疑问是1)我如何将管道类导入到我的spider文件中,或者将自动执行?2) 如何将已爬网的值传递给pipelines.y文件中的close_spider函数。