Python、Selenium和Scrapy不一起工作？_Python_Selenium_Scrapy

Python、Selenium和Scrapy不一起工作？

python selenium scrapy

Python、Selenium和Scrapy不一起工作？,python,selenium,scrapy,Python,Selenium,Scrapy,我想抓取英国航空公司的机票并将其储存在mongodb中。我可以通过搜索表单，但无法获取给定的数据我的蜘蛛： from scrapy import Spider from scrapy.selector import Selector from scrapy.http import FormRequest from selenium import webdriver from selenium.webdriver.common.action_chains import ActionChains

我想抓取英国航空公司的机票并将其储存在mongodb中。我可以通过搜索表单，但无法获取给定的数据

我的蜘蛛：

from scrapy import Spider
from scrapy.selector import Selector
from scrapy.http import FormRequest
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
import time
from flight.items import FlightItem

class BASpider(Spider):
name = "BA"
allowed_domains = ["britishairways.com"]
start_urls = [
    "http://www.britishairways.com/travel/home/public/en_za?DM1_Channel=PPC&DM1_Mkt=ZA&DM1_Campaign=APMEA_ZA_EN_PUREBRAND_MASTERBRAND&Brand=Y&gclid=CLvt24zsqMgCFUGg2wodds4Prw",
]

def __init__(self):
    self.driver = webdriver.Firefox()

def parse(self, response):
    self.driver.get(response.url)
    WebDriverWait(self.driver, 10).until(lambda s: s.find_element_by_xpath('//select[@id="depCountry"]').is_displayed())

    departCountry_form = self.driver.find_element_by_id('depCountry')
    departCity_form = self.driver.find_element_by_id('from')
    oneWay = self.driver.find_element_by_id('journeyTypeOW')
    oneWay.click()
    dest_form = self.driver.find_element_by_id('planTripFlightDestination')
    date_form = self.driver.find_element_by_id('depDate')
    butt = self.driver.find_element_by_class_name('button')
    departCountry_form.send_keys("South Africa")
    departCity_form.send_keys("Johannesburg")
    dest_form.send_keys("London")
    date_form.clear()
    date_form.send_keys("05/10/15")

    actions = ActionChains(self.driver)
    actions.click(butt)
    actions.perform()
    time.sleep(35)



def parse_post(self, response): 
    flightList = Selector(response).xpath('//table[@class="flightList directFlightsTable connectflights"]/tbody/tr')

    for flight in flightList:
        item = FlightItem()
        item['dTime'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" departure"]/div/div/span[1]/text()').extract()[0]
        item['aTime'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" arrival"]/span[1]/text()').extract()[0]
        item['flightNr'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" operator"]/div/div/span[2]/href').extract()[0]
        item['price_economy'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" priceselecter price-M ch3 col1"]/span/span[2]/label/text()').extract()[0]
        item['price_premium'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" priceselecter price-W ch3 col2"]/span/span[2]/label/text()').extract()[0]
        item['price_business'] = flight.xpath(
            'td[7]/table/tbody/tr/td[@class=" priceselecter price-C ch3 col3"]/span/span[2]/label/text()').extract()[0]
        yield item  

    self.driver.close()

我没有收到任何错误，只是没有刮取。

您什么也没有得到的原因是从未调用

parse\u post（）

方法

实际上，您可以直接在

parse（）

回调中从

self.driver.page\u source

中实例化

选择器

：

selector = Selector(text=self.driver.page_source)
flightList = selector.xpath('//table[@class="flightList directFlightsTable connectflights"]/tbody/tr')

for flight in flightList:
    # ...

我也有一些使用selenium的爬行器，所以它们可以工作，也许您遗漏了一些东西，您可以共享日志，并确保您获得了关于每个xpath的信息吗？您的代码在哪里调用

parse_post

？我没有检查您的代码，但作为对您的问题“Python、Selenium和Scrapy不能一起工作？”的回答，我可以告诉您，您可以肯定地将Scrapy与Selenium一起使用。当目标站点包含一些需要通过浏览器交互捕获的ajax交互时，可以使用这种组合。您可以在这里找到一个示例：我应该在哪里调用parse_post（）？对不起，我是新来的this@Jaco好的，更新了答案。请注意，您可能应该避免在内部调用

time.sleep（）

并通过webdriverwait使用显式等待-无论如何，这是另一回事。希望有帮助。