Python 刮皮不';t在我已成功从中提取其他数据的同一页面中,从特定字段中提取数据

Python 刮皮不';t在我已成功从中提取其他数据的同一页面中,从特定字段中提取数据,python,python-2.7,scrapy,scrapy-spider,Python,Python 2.7,Scrapy,Scrapy Spider,事实上,我对Scrapy很陌生,我不知道为什么我得不到我想要的信息。我正在使用www.kayak.com网站上的Scrapy,我想提取纽约所有酒店的入住和退房时间。我已成功地从签入和签出时间所在的同一页面中提取数据,但无法提取这两个字段的数据 我的代码如下所示: import scrapy from scrapy.spiders import CrawlSpider, Rule from scrapy.linkextractors import LinkExtractor from hotel_

事实上,我对Scrapy很陌生,我不知道为什么我得不到我想要的信息。我正在使用www.kayak.com网站上的Scrapy,我想提取纽约所有酒店的入住和退房时间。我已成功地从签入和签出时间所在的同一页面中提取数据,但无法提取这两个字段的数据

我的代码如下所示:

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from hotel_crawl.items import HotelCrawlItem
from bs4 import BeautifulSoup
import time
import urlparse

class MySpider(CrawlSpider):
name = "kayaksite"
allowed_domains = ["www.kayak.com"]
start_urls = ["http://www.kayak.com/New-York-Hotels.15830.hotel.ksp"]

rules = (
    Rule(LinkExtractor(
    restrict_xpaths=("//a[@class='actionlink pagenumber' [contains(text(),'Next')]", )), callback="parse_item", follow=True),

def parse_start_url(self, response):
    print "test"
    self.logger.info('Hi, this is an item page! %s', response.url)
    item = HotelCrawlItem()

    name = response.xpath("//a[@class='hotelname hotelresultsname']//text()").extract()
    price = [BeautifulSoup(i).get_text() for i in response.xpath("//div[@class='pricerange']").extract()]
review = response.xpath("//a[@class='reviewsoverview']/strong/text()").extract()

    url = response.xpath("//a[@class='hotelname hotelresultsname']//@href").extract()

    alldata = zip(name, price, review, url)

    for i in alldata:
        item['name'] = i[0]
        item['price'] = i[1]
        item['review'] = i[2]
        request = scrapy.Request(urlparse.urljoin(response.url, i[3]), callback=self.parse_item2)

        request.meta['item'] = item
        yield request


def parse_item(self, response):
    self.logger.info('Hi, this is an item page! %s', response.url)
    item = HotelCrawlItem()
    name = response.xpath("//a[@class='hotelname hotelresultsname']//text()").extract()
    price = [BeautifulSoup(i).get_text() for i in response.xpath("//div[@class='pricerange']").extract()]
    review = response.xpath("//a[@class='reviewsoverview']/strong/text()").extract()
    url = response.xpath("//a[@class='hotelname hotelresultsname']//@href").extract()

    alldata = zip(name, price, review, url)

    for i in alldata:
        item['name'] = i[0]
        item['price'] = i[1]
        item['review'] = i[2]
        request = scrapy.Request(urlparse.urljoin(response.url, i[3]), callback=self.parse_item2)
        request.meta['item'] = item
        yield request

def parse_item2(self, response):
    print "test--------------"
    self.logger.info('Hi, this is an item page! %s', response.url)
    item = response.meta['item']
    item['location'] = response.xpath("//*[@id='detailsOverviewContactInfo']/div/span/span[1]/text()").extract()
    item['postcode'] = response.xpath("//*[@id='detailsOverviewContactInfo']/div/span/span[3]/text()").extract()   
    item['check_in'] = response.xpath("//*[@id='goodToKnow']/div/div[2]/div[2]/text()").extract()
    item['check_out'] = response.xpath("//*[@id='goodToKnow']/div/div[2]/div[2]/text()").extract()
    yield item

您的签入、签出x路径未返回任何值。对于其他属性,如位置和邮政编码,您的x路径工作正常。此外,它们是该页面中的两个签入和签出数据点[。请在[//输入[@name='checkin_date']/@value]中尝试以下用于chek的xpath下面是价格。哦,我正试图从“好消息”部分获得入住和退房时间,正如你在“便利设施”下面看到的那样节。不是输入字段。您的签入、签出x路径没有返回任何值。对于其他属性,如位置和邮政编码,您的x路径工作正常。此外,它们是该页面中的两个签入和签出数据点[。请尝试以下xpath For chek in[/input[@name='checkin_date']/@value]下面是获取房价。哦,我试图从“好消息”部分获取入住和退房时间,正如你在“便利设施”部分下面看到的那样。而不是输入字段。