Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Scrapy:嵌套选择器:子选择器在所有页面上运行,而不是在父选择上运行_Python 3.x_Xpath_Scrapy - Fatal编程技术网

Python 3.x Scrapy:嵌套选择器:子选择器在所有页面上运行,而不是在父选择上运行

Python 3.x Scrapy:嵌套选择器:子选择器在所有页面上运行,而不是在父选择上运行,python-3.x,xpath,scrapy,Python 3.x,Xpath,Scrapy,我想在.jl中保存一个网页中列出的与一个项目(比如一个人)相关的所有数据。 解析应该是这样的 for eachperson in response.xpath("//div[@class='person']"): person=myItem() person['name'] = eachperson .xpath('//h2[@class="name"]/text()').extract() person['date'] =

我想在
.jl
中保存一个网页中列出的与一个项目(比如一个人)相关的所有数据。 解析应该是这样的

for eachperson in response.xpath("//div[@class='person']"):
            person=myItem()
            person['name'] = eachperson .xpath('//h2[@class="name"]/text()').extract()
            person['date'] = eachperson .xpath('//h3[@class="date"]/text()').extract()
            person['address'] = eachperson .xpath('//div[@class="address"]/p/text()').extract()
            yield person
但我有一只虫子。我已经将我的spider调整到页面(参见下文),以便您可以复制它

import scrapy
import requests

class TutoSpider(scrapy.Spider):
    name = "tuto"
    start_urls = [
            'file:///C:/Users/Me/Desktop/data.html'
        ]

    def parse(self, response):
        for quotechild in response.xpath("//div[@class='quote']"):
            print("\n\n", quotechild.extract())
            print("\n\n", quotechild.xpath('//span[@class="text"]/text()').extract())
第一次打印将返回预期内容,但第二次打印将整个页面的所有
span class=“text”
作为
列表返回,而不仅仅是
quotechild
中的列表

我已经跟随了很多其他的图图,但是我找不到我做错了什么

我在本地文件上运行,因为我工作的原始页面通过javascript呈现html。
.hml
只是

第一次打印的示例:

<div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
        <span class="text" itemprop="text">“A woman is like a tea bag; you never know how strong it is until it's in hot water.”</span>
        <span>by <small class="author" itemprop="author">Eleanor Roosevelt</small>
        <a href="/author/Eleanor-Roosevelt">(about)</a>
        </span>
        ...
    </div>

使用
/
启动xpath表达式将使其从文档根开始匹配,而不管您在哪个元素上使用它

要使xpath相对于元素(仅搜索其子体),请使用
/
启动表达式

>>> len(quotechild.xpath('//span[@class="text"]/text()'))
10
>>> len(quotechild.xpath('.//span[@class="text"]/text()'))
1
>>> quotechild.xpath('.//span[@class="text"]/text()').extract_first()
'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'

谢谢你!询问。但是,当只传递
quotechild
时,scrapy怎么“知道”文档根是什么呢。quotechild只是几个div,而不是整个文档
quotechild
是一个粗糙的选择器,它是一个对象,它知道(除其他外)它是文档的一部分。哦,好吧,这是有意义的。谢谢!
>>> len(quotechild.xpath('//span[@class="text"]/text()'))
10
>>> len(quotechild.xpath('.//span[@class="text"]/text()'))
1
>>> quotechild.xpath('.//span[@class="text"]/text()').extract_first()
'“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”'