Javascript Scrapy-从网页上的下拉列表（例如日期）中进行选择_Javascript_Python_Html_Scrapy

Javascript Scrapy-从网页上的下拉列表（例如日期）中进行选择

javascript python html scrapy

Javascript Scrapy-从网页上的下拉列表（例如日期）中进行选择,javascript,python,html,scrapy,Javascript,Python,Html,Scrapy,我是scrapy和python的新手，我正在尝试从下面的起始url中删除数据登录后，这是我的起始url--> 起始URL=[“？”] （a）从那里，我需要与网页互动，选择---机场--- 然后选择机场、日期、时间段--- 我该怎么做？我想循环所有时间段和过去的日期我已经使用firebug查看了源代码，我不能在这里显示，因为我没有足够的点来发布图像我读了一篇关于Splinter使用的帖子（b）在选择之后，它将引导我进入一个页面，其中有指向最终页面的链接，其中包含我想要的信息。我如何

我是scrapy和python的新手，我正在尝试从下面的起始url中删除数据

登录后，这是我的起始url-->

起始URL=[“？”]

（a）从那里，我需要与网页互动，选择---机场---

然后选择机场、日期、时间段---

我该怎么做？我想循环所有时间段和过去的日期

我已经使用firebug查看了源代码，我不能在这里显示，因为我没有足够的点来发布图像
我读了一篇关于Splinter使用的帖子

（b）在选择之后，它将引导我进入一个页面，其中有指向最终页面的链接，其中包含我想要的信息。我如何填充链接并对每个链接进行scrapy查找以提取信息

-使用规则？我应该在哪里插入规则/linkextractor函数

我愿意尝试一下自己，希望能得到帮助，找到可以指导我的帖子。。我是一名学生，在这方面我已经花了一个多星期的时间。。我已经完成了scrapy教程，python教程，阅读了scrapy文档，并在stackoverflow中搜索了以前的文章，但我没有找到涵盖这方面的文章

非常感谢

到目前为止，我要登录的代码以及要通过xpath从最终目标站点中删除的项目：

`import scrapy

from tutorial.items import FlightItem
from scrapy.http import FormRequest

class flightSpider(scrapy.Spider):
    name = "flight"
    allowed_domains = ["flightstats.com"]
    login_page =     'https://www.flightstats.com/go/Login/login_input.do;jsessionid=0DD6083A334AADE3FD6923ACB8DDCAA2.web1:8009?'
    start_urls = [
    "http://www.flightstats.com/go/HistoricalFlightStatus/flightStatusByFlight.do?"]

def init_request(self):
    #"""This function is called before crawling starts."""
    return Request(url=self.login_page, callback=self.login)

def login(self, response):
 #"""Generate a login request."""
    return FormRequest.from_response(response,formdata= {'loginForm_email': 'marvxxxxxx@hotmail.com', 'password': 'xxxxxxxx'},callback=self.check_login_response)

def check_login_response(self, response):
        #"""Check the response returned by a login request to see if we aresuccessfully logged in."""
    if "Sign Out" in response.body:
        self.log("\n\n\nSuccessfully logged in. Let's start crawling!\n\n\n")
         # Now the crawling can begin..

    return self.initialized() # ****THIS LINE FIXED THE LAST PROBLEM*****

    else:
            self.log("\n\n\nFailed, Bad times :(\n\n\n")
         # Something went wrong, we couldn't log in, so nothing happens.

def parse(self, response):
    for sel in response.xpath('/html/body/div[2]/div[2]/div'):
        item = flightstatsItem()
        item['flight_number'] = sel.xpath('/div[1]/div[1]/h2').extract()
        item['aircraft_make'] = sel.xpath('/div[4]/div[2]/div[2]/div[2]').extract()
        item['dep_date'] = sel.xpath('/div[2]/div[1]/div').extract()
        item['dep_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[1]').extract()
        item['arr_airport'] = sel.xpath('/div[1]/div[2]/div[2]/div[2]').extract()
        item['dep_gate_scheduled'] = sel.xpath('/div[2]/div[2]/div[1]/div[2]/div[2]').extract()
        item['dep_gate_actual'] = sel.xpath('/div[2]/div[2]/div[1]/div[3]/div[2]').extract()
        item['dep_runway_actual'] = sel.xpath('/div[2]/div[2]/div[2]/div[3]/div[2]').extract()
        item['dep_terminal'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[1]').extract()
        item['dep_gate'] = sel.xpath('/div[2]/div[2]/div[3]/div[2]/div[2]').extract()
        item['arr_gate_scheduled'] = sel.xpath('/div[3]/div[2]/div[1]/div[2]/div[2]').extract()
        item['arr_gate_actual'] = sel.xpath('/div[3]/div[2]/div[1]/div[3]/div[2]').extract()
        item['arr_terminal'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[1]').extract()
        item['arr_gate'] = sel.xpath('/div[3]/div[2]/div[3]/div[2]/div[2]').extract()

        yield item`