使用Scrapy Python提取数据时出错

使用Scrapy Python提取数据时出错,python,web-scraping,scrapy,web-scraping-language,Python,Web Scraping,Scrapy,Web Scraping Language,但我犯了一个错误 import scrapy import logging class CountriesSpider(scrapy.Spider): name = 'countries' allowed_domains = ['www.worldometers.info'] start_urls = ['https://www.worldometers.info/world-population/population-by-country/'] def par

但我犯了一个错误

import scrapy
import logging

class CountriesSpider(scrapy.Spider):
    name = 'countries'
    allowed_domains = ['www.worldometers.info']
    start_urls = ['https://www.worldometers.info/world-population/population-by-country/']
    def parse(self, response):
        countries = response.xpath("//td/a")
        for country in countries:
        name = country.xpath(".//text()").get()
        link = country.xpath(".//@href").get()
    
        # absolute_url = f"https://www.worldometers.info{link}"
        # absolute_url = response.urljoin(link)

        yield response.follow(url=link, callback=self.parse_country, meta={'country_name':name})

def parse_country(self, response):
    name = response.request.meta['country_name']
    rows = response.xpath("(//table[@class='table table-striped table-bordered table-hover table-condensed table-list'])[1])[1]/tbody/tr")
    for row in rows:
        year = row.xpath(".//td[1]/text()").get()
        population = row.xpath(".//td[2]/strong/text()").get()
        yield {
            'year': year,
            'population':population
        }
(新虚拟工作区)SubhrajyotisAir:WorldMeter subhrajyotisaha$刮擦爬行国家
2021-05-29 23:33:14[scrapy.utils.log]信息:scrapy 2.4.1已启动(机器人:世界计)
2021-05-29 23:33:14[scrapy.utils.log]信息:版本:lxml 4.6.3.0,libxml2.9.10,cssselect 1.1.0,parsel 1.5.2,w3lib 1.21.0,Twisted 21.2.0,Python 3.8.10(默认,2021年5月19日,11:01:55)-[Clang 10.0.0],pyOpenSSL 20.0.1(OpenSSL 1.1.1.1k 2021年3月25日),密码学3.4.7,macOS-10.14.1-x86386位i386
2021-05-29 23:33:14[scrapy.utils.log]调试:使用reactor:twisted.internet.selectreactor.selectreactor
2021-05-29 23:33:14[刮擦爬虫]信息:覆盖设置:
{'BOT_NAME':'worldometer',
“NEWSPIDER_模块”:“WorldMeter.spider”,
“机器人服从”:没错,
“SPIDER_模块”:['worldometer.SPIDER']}
2021-05-29 23:33:14[scrapy.extensions.telnet]信息:telnet密码:87f0a20eef9428d7
2021-05-29 23:33:14[scrapy.middleware]信息:启用的扩展:
['scrapy.extensions.corestats.corestats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.logstats']
2021-05-29 23:33:14[scrapy.middleware]信息:启用的下载程序中间件:
['scrapy.downloaderMiddleware.robotstxt.RobotsTxtMiddleware',
'scrapy.downloaderMiddleware.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloaderMiddleware.defaultheaders.DefaultHeadersMiddleware',
'scrapy.DownloaderMiddleware.useragent.UserAgentMiddleware',
'scrapy.DownloaderMiddleware.retry.RetryMiddleware',
'scrapy.DownloaderMiddleware.redirect.MetaRefreshMiddleware',
'scrapy.DownloaderMiddleware.httpcompression.HttpCompressionMiddleware',
'scrapy.DownloaderMiddleware.redirect.RedirectMiddleware',
“scrapy.DownloaderMiddleware.cookies.CookiesMiddleware”,
'scrapy.downloadermiddleware.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddleware.stats.DownloaderStats']
2021-05-29 23:33:14[scrapy.middleware]信息:启用的蜘蛛中间件:
['scrapy.spidermiddleware.httperror.httperror中间件',
'刮皮.SpiderMiddleware.场外.场外Iddleware',
“scrapy.Spidermiddleware.referer.RefererMiddleware”,
'scrapy.spiderMiddleware.urllength.UrlLengthMiddleware',
'scrapy.spidermiddleware.depth.DepthMiddleware']
2021-05-29 23:33:14[scrapy.middleware]信息:启用的项目管道:
[]
2021-05-29 23:33:14[刮屑.堆芯.发动机]信息:十字轴已打开
2021-05-29 23:33:14[scrapy.extensions.logstats]信息:爬网0页(0页/分钟),爬网0项(0项/分钟)
2021-05-29 23:33:14[scrapy.extensions.telnet]信息:telnet控制台监听127.0.0.1:6023
2021-05-29 23:33:18[scrapy.core.engine]调试:爬网(404)(参考:无)
2021-05-29 23:33:18[protego]调试:第2行的规则,没有任何用户代理对其强制执行。
2021-05-29 23:33:18[protego]调试:第10行的规则,没有任何用户代理对其强制执行。
2021-05-29 23:33:18[protego]调试:第12行的规则,没有任何用户代理对其强制执行。
2021-05-29 23:33:18[protego]调试:第14行的规则,没有任何用户代理对其强制执行。
2021-05-29 23:33:18[protego]调试:第16行的规则,没有任何用户代理对其强制执行。
2021-05-29 23:33:19[scrapy.core.engine]调试:爬网(200)(参考:无)
2021-05-29 23:33:20[scrapy.core.engine]调试:爬网(200)(参考:https://www.worldometers.info/world-population/population-by-country/)
2021-05-29 23:33:20[刮板芯刮板]错误:十字轴错误处理(参考:https://www.worldometers.info/world-population/population-by-country/)
回溯(最近一次呼叫最后一次):
xpath中的文件“/Users/subhrajyotisaha/opt/anaconda3/envs/new_Virtual_workspace/lib/python3.8/site packages/parsel/selector.py”,第236行
结果=xpathev(查询,名称空间=nsp,
lxml.etree.\u Element.xpath中的文件“src/lxml/etree.pyx”,第1582行
文件“src/lxml/xpath.pxi”,第305行,位于lxml.etree.XPathElementEvaluator中__
文件“src/lxml/xpath.pxi”,第225行,在lxml.etree.\u xpatheevaluorbase.\u handle\u result中
lxml.etree.xpathevaleror:表达式无效

我正在使用conda虚拟工作区环境和vs code-macos

(new_Virtual_workspace) SubhrajyotisAir:worldometer subhrajyotisaha$ scrapy crawl countries

2021-05-29 23:33:14 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: worldometer)

2021-05-29 23:33:14 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 21.2.0, Python 3.8.10 (default, May 19 2021, 11:01:55) - [Clang 10.0.0 ], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform macOS-10.14.1-x86_64-i386-64bit

2021-05-29 23:33:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor

2021-05-29 23:33:14 [scrapy.crawler] INFO: Overridden settings:

{'BOT_NAME': 'worldometer',

 'NEWSPIDER_MODULE': 'worldometer.spiders',

 'ROBOTSTXT_OBEY': True,

 'SPIDER_MODULES': ['worldometer.spiders']}

2021-05-29 23:33:14 [scrapy.extensions.telnet] INFO: Telnet Password: 87f0a20eef9428d7

2021-05-29 23:33:14 [scrapy.middleware] INFO: Enabled extensions:

['scrapy.extensions.corestats.CoreStats',

 'scrapy.extensions.telnet.TelnetConsole',

 'scrapy.extensions.memusage.MemoryUsage',

 'scrapy.extensions.logstats.LogStats']

2021-05-29 23:33:14 [scrapy.middleware] INFO: Enabled downloader middlewares:

['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',

 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',

 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',

 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',

 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',

 'scrapy.downloadermiddlewares.retry.RetryMiddleware',

 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',

 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',

 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',

 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',

 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',

 'scrapy.downloadermiddlewares.stats.DownloaderStats']

2021-05-29 23:33:14 [scrapy.middleware] INFO: Enabled spider middlewares:

['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',

 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',

 'scrapy.spidermiddlewares.referer.RefererMiddleware',

 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',

 'scrapy.spidermiddlewares.depth.DepthMiddleware']

2021-05-29 23:33:14 [scrapy.middleware] INFO: Enabled item pipelines:

[]

2021-05-29 23:33:14 [scrapy.core.engine] INFO: Spider opened

2021-05-29 23:33:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2021-05-29 23:33:14 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023

2021-05-29 23:33:18 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.worldometers.info/robots.txt> (referer: None)

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 2 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 12 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 14 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 16 without any user agent to enforce it on.

2021-05-29 23:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.worldometers.info/world-population/population-by-country/> (referer: None)

2021-05-29 23:33:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.worldometers.info/world-population/ethiopia-population/> (referer: https://www.worldometers.info/world-population/population-by-country/)

2021-05-29 23:33:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.worldometers.info/world-population/ethiopia-population/> (referer: https://www.worldometers.info/world-population/population-by-country/)

Traceback (most recent call last):

  File "/Users/subhrajyotisaha/opt/anaconda3/envs/new_Virtual_workspace/lib/python3.8/site-packages/parsel/selector.py", line 236, in xpath

    result = xpathev(query, namespaces=nsp,

  File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath

  File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__

  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result

lxml.etree.XPathEvalError: Invalid expression