Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何从python中的动态下拉列表中提取/刮取选项值?_Python_Selenium_Web Scraping_Webdriver - Fatal编程技术网

如何从python中的动态下拉列表中提取/刮取选项值?

如何从python中的动态下拉列表中提取/刮取选项值?,python,selenium,web-scraping,webdriver,Python,Selenium,Web Scraping,Webdriver,我试图从一个网页中提取数据,其中下拉列表中的选项根据我们的输入动态加载。我正在使用SeleniumWebDriver从下拉列表中提取数据。请参见下面的屏幕截图 城市下拉选项在我选择州后加载,车站下拉选项在我选择城市后加载 到目前为止,我能够用这个代码提取电台名称 citiesList = [] stationNameList = [] siteIdList = [] for city in cityOptions[1:]: citiesList.append(city.text)

我试图从一个网页中提取数据,其中下拉列表中的选项根据我们的输入动态加载。我正在使用SeleniumWebDriver从下拉列表中提取数据。请参见下面的屏幕截图

城市下拉选项在我选择州后加载,车站下拉选项在我选择城市后加载

到目前为止,我能够用这个代码提取电台名称

citiesList = []
stationNameList = []
siteIdList = []

for city in cityOptions[1:]:
    citiesList.append(city.text)

stationDropDown = driver.find_element_by_xpath("//select[contains(@id,'stations')]")
stationOptions = stationDropDown.find_elements_by_tag_name('option')

 
      for ele in citiesList:
            cityDropdown.send_keys(ele, Keys.RETURN)
            time.sleep(2)
            stationDropDown.click()
            print(stationDropDown.text)


有人能帮我提取每个州和城市的站点ID吗?

请尝试下面使用python的方法-简单、直接、可靠、快速,并且在处理请求时需要更少的代码。我在检查了谷歌chrome浏览器的网络部分后,从网站本身获取了API URL

下面的脚本到底在做什么:

  • 首先,它将使用API URL和有效负载(执行POST请求非常重要)执行POST请求并获取返回的数据
  • 获取数据后,脚本将使用JSON.loads库解析JSON数据
  • 最后,它将逐个遍历所有站点列表,并打印详细信息,如州名、城市名、站点名和站点Id
  • 网络呼叫标签

    下面代码的输出


    网站URL pls?网站上只有站点ID可用。你只想放弃那个还是其他什么?站点ID和站点名称。谢谢@Vin。谢谢你的帮助。剧本对我很管用。我找不到您使用的帖子url。我已在的“网络”选项卡中搜索请求。我已添加“网络”选项卡的图像以供参考。如果这对你有效,请投票并接受答案。谢谢
    def scrape_aqi_site_id():
    URL = 'https://app.cpcbccr.com/aqi_dashboard/aqi_station_all_india' #API URL
    payload = 'eyJ0aW1lIjoxNjAzMTA0NTczNDYzLCJ0aW1lWm9uZU9mZnNldCI6LTMzMH0=' #Unique payload fetched from the network request
    response = requests.post(URL,data=payload,verify=False) #POST request to get the data using URL and Payload information
    result = json.loads(response.text) # parse the JSON object using json library
    extracted_states = result['stations'] 
    for state in range(len(extracted_states)): # loop over extracted states and its stations data.
        print('=' * 120)
        print('Scraping station data for state : ' + extracted_states[state]['stateID'])
        for station in range(len(extracted_states[state]['stationsInCity'])): # loop over each state station data to get the information of stations
            print('-' * 100)
            print('Scraping data for city and its station : City (' + extracted_states[state]['stationsInCity'][station]['cityID'] + ') & station (' + extracted_states[state]['stationsInCity'][station]['name'] + ')')
            print('City :' + extracted_states[state]['stationsInCity'][station]['cityID'])
            print('Station Name : ' + extracted_states[state]['stationsInCity'][station]['name'])
            print('Station Site Id : ' + extracted_states[state]['stationsInCity'][station]['id'])
            print('-' * 100)        
        print('Scraping of data for state : (' + extracted_states[state]['stateID'] + ') is conmpleted now going for another one...')
        print('=' * 120)
    
    scrape_aqi_site_id()