Python 3.x Scrapy：嵌套选择器：子选择器在所有页面上运行，而不是在父选择上运行_Python 3.x_Xpath_Scrapy

Python 3.x Scrapy：嵌套选择器：子选择器在所有页面上运行，而不是在父选择上运行

python-3.x xpath scrapy

Python 3.x Scrapy：嵌套选择器：子选择器在所有页面上运行，而不是在父选择上运行,python-3.x,xpath,scrapy,Python 3.x,Xpath,Scrapy,我想在.jl中保存一个网页中列出的与一个项目（比如一个人）相关的所有数据。解析应该是这样的 for eachperson in response.xpath("//div[@class='person']"): person=myItem() person['name'] = eachperson .xpath('//h2[@class="name"]/text()').extract() person['date'] =

我想在

.jl

中保存一个网页中列出的与一个项目（比如一个人）相关的所有数据。解析应该是这样的

for eachperson in response.xpath("//div[@class='person']"):
            person=myItem()
            person['name'] = eachperson .xpath('//h2[@class="name"]/text()').extract()
            person['date'] = eachperson .xpath('//h3[@class="date"]/text()').extract()
            person['address'] = eachperson .xpath('//div[@class="address"]/p/text()').extract()
            yield person

但我有一只虫子。我已经将我的spider调整到页面（参见下文），以便您可以复制它

import scrapy
import requests

class TutoSpider(scrapy.Spider):
    name = "tuto"
    start_urls = [
            'file:///C:/Users/Me/Desktop/data.html'
        ]

    def parse(self, response):
        for quotechild in response.xpath("//div[@class='quote']"):
            print("\n\n", quotechild.extract())
            print("\n\n", quotechild.xpath('//span[@class="text"]/text()').extract())

第一次打印将返回预期内容，但第二次打印将整个页面的所有

span class=“text”

作为

列表返回，而不仅仅是quotechild
中的列表
我已经跟随了很多其他的图图，但是我找不到我做错了什么
我在本地文件上运行，因为我工作的原始页面通过javascript呈现html。

.hml
只是
第一次打印的示例：
<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“A woman is like a tea bag; you never know how strong it is until it's in hot water.”</span>
        <span>by <small class="author" itemprop="author">Eleanor Roosevelt</small>
        <a href="/author/Eleanor-Roosevelt">(about)</a>
        </span>
        ...
    </div>

使用/
启动xpath表达式将使其从文档根开始匹配，而不管您在哪个元素上使用它
要使xpath相对于元素（仅搜索其子体），请使用/
启动表达式
>>> len(quotechild.xpath('//span[@class="text"]/text()'))
10
>>> len(quotechild.xpath('.//span[@class="text"]/text()'))
1
>>> quotechild.xpath('.//span[@class="text"]/text()').extract_first()
'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

谢谢你！询问。但是，当只传递quotechild
时，scrapy怎么“知道”文档根是什么呢。quotechild只是几个div，而不是整个文档quotechild是一个粗糙的选择器，它是一个对象，它知道（除其他外）它是文档的一部分。哦，好吧，这是有意义的。谢谢！
>>> len(quotechild.xpath('//span[@class="text"]/text()'))
10
>>> len(quotechild.xpath('.//span[@class="text"]/text()'))
1
>>> quotechild.xpath('.//span[@class="text"]/text()').extract_first()
'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'