Python xpath前后的子字符串-使用scrapy_Python_Web Scraping_Scrapy

Python xpath前后的子字符串-使用scrapy

python web-scraping scrapy

Python xpath前后的子字符串-使用scrapy,python,web-scraping,scrapy,Python,Web Scraping,Scrapy,我正在使用scrapy删除电影列表： import scrapy class ScrapeMovies(scrapy.Spider): name='movies-to-see' start_urls = [ 'https://www.listchallenges.com/200-movies-to-see-before-you-die/' ] def parse(self, response): for film in resp

我正在使用scrapy删除电影列表：

import scrapy
class ScrapeMovies(scrapy.Spider):
    name='movies-to-see'

    start_urls = [
        'https://www.listchallenges.com/200-movies-to-see-before-you-die/'
    ]

    def parse(self, response):
        for film in response.xpath('//div[@class="item-click-area"]'):
            yield{
                'year': film.xpath('substring-before(substring-after(.//div[@class="item-name"]/text(), '('), ')')').extract()
                'title': film.xpath('.//div[@class="item-name"]/text()').extract()[0].strip(),
                'rank': film.xpath('.//div[@class="item-rank"]/text()').extract()[0].strip()
            }

在所需的页面上，您可以将电影的日期和标题粘在一起。我想使用xpath提取括号之间的日期。然而，我不断收到一个语法错误。为什么呢？或者还有什么好主意可以取消电影制作的年份？

您应该在XPath中正确地混合使用单引号和双引号。试一试

'substring-before(substring-after(.//div[@class="item-name"]/text(), "("), ")")'