Python 3.x 刮屑请求回调不'；t进入下一个函数_Python 3.x_Scrapy

Python 3.x 刮屑请求回调不'；t进入下一个函数

python-3.x scrapy

Python 3.x 刮屑请求回调不'；t进入下一个函数,python-3.x,scrapy,Python 3.x,Scrapy,这是我的密码： import scrapy class BookingSpider(scrapy.Spider): name = 'booking-spider' allowed_domains = ['booking.com'] start_urls = [ 'https://www.booking.com/country.de.html?aid=356980;label=gog235jc-1DCAIoLDgcSAdYA2gsiAEBmAEHuAEHy

这是我的密码：

import scrapy

class BookingSpider(scrapy.Spider):
    name = 'booking-spider'
    allowed_domains = ['booking.com']
    start_urls = [
        'https://www.booking.com/country.de.html?aid=356980;label=gog235jc-1DCAIoLDgcSAdYA2gsiAEBmAEHuAEHyAEP2AED6AEB'
        '-AECiAIBqAIDuAK7q7DyBcACAQ;sid=8de61678ac61d10a89c13a3941fd3dcd'
    ]

    # get country page
    def parse(self, response):

        for countryurl in response.xpath('//a[contains(text(),"Schweiz")]/@href'):
            url = response.urljoin(countryurl.extract())
            print("COUNTRYURL", url)
            yield scrapy.Request(url, callback=self.parse_country)

    # get page of all hotels in a country
    def parse_country(self, response):

        for hotelsurl in response.xpath('//a[@class="bui-button bui-button--secondary"]/@href'):
            url = response.urljoin(hotelsurl.extract())
            print("HOTELURL", url)
            yield scrapy.Request(url, callback=self.parse_hotel)

    def parse_hotel(self, response):

        print("entering parse_hotel")
        hotelurl = response.xpath('//*[(@ id = "hp_hotel_name")]')
        print("URL", hotelurl)

它不在

parse_hotel

功能中。我不明白为什么？

我的错在哪里？提前感谢您的建议

问题在这条线上

response.xpath('//a[@class="bui-button bui-button--secondary"]/@href')

在这里，您的XPATH提取这样的URL：

https://www.booking.com/searchresults.de.html?dest_id=204;dest_type=country&

但它们应该是这样的：

https://www.booking.com/searchresults.de.html?label=gen173nr-1DCAIoLDgcSAdYBGhSiAEBmAEHuAEHyAEM2AED6AEB-AECiAIBqAIDuAKz_uDyBcACAQ;sid=a3807e20e99c61282850cfdf02041c07;dest_id=204;dest_type=country&

正因为如此，你的蜘蛛试图打开同一个网页，但它被Scrapy Dupefilter阻止。这就是为什么不调用回调的原因

我想，url中缺少的部分是由JavaScript生成的。

你能解释一下你想要从程序中获得什么输出吗？程序应该放在酒店页面上，我想在那里获取数据。我想在

parse_hotel

函数中看到xpath的输出，但它仍然停留在前面的函数中。这是真的！在第一个函数中也是如此！谢谢你的帮助！