Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 创建URL列表时出现碎片错误xPath表达式无效_Python_Xpath_Scrapy - Fatal编程技术网

Python 创建URL列表时出现碎片错误xPath表达式无效

Python 创建URL列表时出现碎片错误xPath表达式无效,python,xpath,scrapy,Python,Xpath,Scrapy,我正在用Scrapy清理公寓网站。我想以partments.com/boston ma/X的形式进入每一页,其中X是表示页码的整数 一旦到了那里,我想提取所有的属性URL,它们都有属性链接类。然后我将为每个属性编写一个parse_项 我发现了错误 ValueError:XPath错误://*[contains(@class, “属性链接“”)]/@href 我不知道我的xPath出了什么问题。请告知 代码: 谢谢大家! 你在写什么 apts=response.xpath(“/*[包含(@clas

我正在用Scrapy清理公寓网站。我想以
partments.com/boston ma/X
的形式进入每一页,其中X是表示页码的整数

一旦到了那里,我想提取所有的属性URL,它们都有
属性链接类
。然后我将为每个属性编写一个parse_项

我发现了错误

ValueError:XPath错误://*[contains(@class, “属性链接“”)]/@href

我不知道我的xPath出了什么问题。请告知

代码:

谢谢大家!

你在写什么
apts=response.xpath(“/*[包含(@class,'property link')]/@href”).extract()
你必须写作
apts=response.xpath(“/*[包含(@class,'property link')]/@href”).extract()
您正在添加“属性链接”两个倒逗号。后属性链接

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from apt.items import AptItem
from urllib.parse import urljoin

class AptSpider(CrawlSpider):
    name = "apt"
    allowed_domains = ["apartments.com"]
    start_urls = ["https://www.apartments.com/boston-ma/"]

    rules = (Rule(LinkExtractor(allow=r'[1-9]+/*'), callback='parse_urls', follow=True),)

    def parse_urls(self, response):
        apts = response.xpath("//*[contains(@class, 'property-link'')]/@href").extract()
        for a in apts:
            url = urljoin(response.url, a)
            yield scrapy.Request(url, callback=parse_item)


    #def parse_item(self, response):
        #scrape data here
        #item = AptItem()