Python 从未使用Scrapy调用回调函数_Python_Callback_Scrapy_Scrapy Spider

Python 从未使用Scrapy调用回调函数

python scrapy

Python 从未使用Scrapy调用回调函数,python,callback,scrapy,scrapy-spider,Python,Callback,Scrapy,Scrapy Spider,我不熟悉Scrapy和python。我已经花了几个小时试着调试和寻找有用的响应，但我仍然被卡住了。我试图从www.pro-football-reference.com中提取数据。这是我现在掌握的密码 import scrapy from nfl_predictor.items import NflPredictorItem class NflSpider(scrapy.Spider): name = "nfl2" allowed_domains = ["http://www.pr

我不熟悉Scrapy和python。我已经花了几个小时试着调试和寻找有用的响应，但我仍然被卡住了。我试图从www.pro-football-reference.com中提取数据。这是我现在掌握的密码

import scrapy

from nfl_predictor.items import NflPredictorItem

class NflSpider(scrapy.Spider):
   name = "nfl2"
   allowed_domains = ["http://www.pro-football-reference.com/"]
   start_url = [
    "http://www.pro-football-reference.com/boxscores/201509100nwe.htm"
   ]

    def parse(self, response):
        print "parse"
        for href in response.xpath('// [@id="page_content"]/div[1]/table/tr/td/a/@href'):
        url = response.urljoin(href.extract())
        yield scrapy.Request(url, callback=self.parse_game_content)

    def parse_game_content(self, response):
        print "parse_game_content"
        items = []
        for sel in response.xpath('//table[@id = "team_stats"]/tr'):
            item = NflPredictorItem()
            item['away_stats'] = sel.xpath('td[@align = "center"][1]/text()').extract()
            item['home_stats'] = sel.xpath('td[@align = "center"][2]/text()').extract()
        items.append(item)
    return items

我使用parse命令进行调试，并使用此命令

scrapy parse --spider=nfl2 "http://www.pro-football-reference.com/boxscores/201509100nwe.htm"

我得到以下输出

>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items  ------------------------------------------------------------
[]

# Requests  -----------------------------------------------------------------
[<GET http://www.pro-football-reference.com/years/2015/games.htm>,
 <GET http://www.nfl.com/scores/2015/REG1>,
 <GET http://www.pro-football-reference.com/boxscores/201509130buf.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130chi.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130crd.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130dal.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130den.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130htx.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130jax.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130nyj.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130rai.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130ram.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130sdg.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130tam.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509130was.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509140atl.htm>,
 <GET http://www.pro-football-reference.com/boxscores/201509140sfo.htm>]

>>状态深度级别1默认情况下，parse命令获取给定的URL，并使用通过--callback选项传递的方法，使用处理该URL的爬行器对其进行解析，如果未提供，则进行解析。在您的情况下，它仅解析解析函数。将命令更改为给定--callback
，如下所示：
scrapy parse --spider=nfl2 "http://www.pro-football-reference.com/boxscores/201509100nwe.htm" --callback=parse_game_content

另外，最好按如下方式更改parse_game_content函数

def解析游戏内容（自我、响应）：
打印“解析游戏内容”
对于response.xpath（'//table[@id=“team_stats”]/tr'）中的sel：
item=NflPredictorItem（）
item['away_stats']=sel.xpath（'td[@align=“center”][1]/text（））.extract（）
item['home_stats']=sel.xpath（'td[@align=“center”][2]/text（））.extract（）
yield item
是否确实已导入所有库？