Python Scrapy从XPath返回空数组_Python_Python 3.x_Xpath_Scrapy_Web Crawler

Python Scrapy从XPath返回空数组

python python-3.x xpath scrapy web-crawler

Python Scrapy从XPath返回空数组,python,python-3.x,xpath,scrapy,web-crawler,Python,Python 3.x,Xpath,Scrapy,Web Crawler,我正在尝试从以下网页收集运动员的数据：。我已经能够收集运动员的名字，但是我很难用同样的方法收集他们的学校名称。我知道学校名称作为文本包含在块内的链接中，但它只返回一个空数组这是我的密码： import scrapy class AthletesSpider(scrapy.Spider): name = 'athletes' allowed_domains = ['athletic.net'] start_urls = ['https://www.athletic.ne

我正在尝试从以下网页收集运动员的数据：。我已经能够收集运动员的名字，但是我很难用同样的方法收集他们的学校名称。我知道学校名称作为文本包含在块内的链接中，但它只返回一个空数组

这是我的密码：

import scrapy

class AthletesSpider(scrapy.Spider):
    name = 'athletes'
    allowed_domains = ['athletic.net']
    start_urls = ['https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#!/L0']

    def parse(self, response):
        yield {
            'athlete_name' : response.xpath("//h2/text()").extract_first(),
            'school_name' : response.xpath("//h1/a/text()").extract_first()
        }

我遗漏了什么吗？

在你的字典里加个逗号

import scrapy

class AthletesSpider(scrapy.Spider):
    name = 'athletes'
    allowed_domains = ['athletic.net']
    start_urls = ['https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#!/L0']

    def parse(self, response):
        yield {
            'athlete_name' : response.xpath("//h2/text()").extract_first(), <--here
            'school_name' : response.xpath("//h1/a/text()").extract_first()
        }

import scrapy
类AthletesSpider（scrapy.Spider）：
姓名=‘运动员’
允许的_域=['athletic.net']
起始URL=['https://www.athletic.net/TrackAndField/Athlete.aspx?AID=7844096#！/L0']
def解析（自我，响应）：
屈服{
“运动员名称”：response.xpath（“//h2/text（）”）.extract_first（），噢，我的天，谢谢你，这太愚蠢了。但是第二行仍然返回一个空数组而不是学校名称-是否缺少其他内容？可以尝试的一点是（如果你有chrome）要查看页面，找到元素，右键单击，然后单击复制xpath。这通常是我用来轻松识别元素的方法。我得到了：/*[@id=“anetMain”]/div[3]/team nav/div/div/team nav logo/div/div/h1/a
学校元素。哦，这是一个有用的提示！但我得到了另一个“无效语法”我尝试运行'school\u name'：response.xpath（“//*[@id=“anetMain”]/div[3]/team nav/div/div/team nav logo/div/div/div/h1/a”）时出错。首先提取（）