Web scraping 失败-它提供一个空输出_Web Scraping_Scrapy_Web Crawler_Scrapy Spider

Web scraping 失败-它提供一个空输出

web-scraping scrapy web-crawler

Web scraping 失败-它提供一个空输出,web-scraping,scrapy,web-crawler,scrapy-spider,Web Scraping,Scrapy,Web Crawler,Scrapy Spider,我正在废弃一个网站，我使用了对象的正确xpath，但得到的结果是空的。我使用以下代码： import scrapy from scrapy.http.request import Request from indicators.ESGIndicators import ESGIndicators from scrapy.spiders import CrawlSpider,Rule from scrapy.linkextractors import LinkExtractor from lxm

我正在废弃一个网站，我使用了对象的正确xpath，但得到的结果是空的。我使用以下代码：

import scrapy
from scrapy.http.request import Request
from indicators.ESGIndicators import ESGIndicators
from scrapy.spiders import CrawlSpider,Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html

class mySpider(scrapy.Spider):
    name = "YALE"
    allowed_domains = ["epi.envirocenter.yale.edu"]
    start_urls = (
        'https://epi.envirocenter.yale.edu/epi-indicator-report/WWT',
        )

    def parse(self, response):
        return Request(
            url='https://epi.envirocenter.yale.edu/epi-indicator-report/WWT',
            callback=self.parse_table
        )

    def parse_table(self,response):
        for tr in response.xpath('//*[@id="block-system-main"]/div/div/div/div[3]/table[2]/tr'):
            item = ESGIndicators()
            item['country'] = tr.xpath('td[1]/a/text()').extract_first()
            item['data1'] = tr.xpath('td[2]/text()').extract()
            item['data2'] = tr.xpath('td[3]/text()').extract()
            item['data3'] = tr.xpath('td[4]/text()').extract()
            item['data4'] = tr.xpath('td[5]/text()').extract()
            print(item)
            yield item

我没有得到任何错误，但它不会放弃任何东西。我尝试在xpath中使用tdoby，但没有，但它不起作用

有人知道这个问题吗

提前谢谢

您应该仔细检查xpath表达式

/*[@id=“block system main”]/div/div/div/div[3]/table[2]/tr

。在scrapy中，语法有效的xpath/css表达式如果不捕获页面上的任何元素，将返回一个空列表而不会抛出错误，因此将跳过

parse_table

中的整个循环。尝试使用

scrapyshell调试目标页https://epi.envirocenter.yale.edu/epi-indicator-report/WWT

在命令行中。

您需要尽可能使用简单的XPath表达式：

for tr in response.xpath('//tr[contains(@class, "epi-row-territory")]'):

我得到“模块”对象不可调用。但这不是错误，因为如果我调试另一个我已经废弃的页面，我会得到相同的消息。