Python 如何通过单击“获取javascript生成的html”;“检查元件”;在浏览器中?
我正在尝试从此网页(日历下面的框)获取可用时段的小时数: 我已经阅读了其他相关问题并编写了此代码Python 如何通过单击“获取javascript生成的html”;“检查元件”;在浏览器中?,python,selenium-webdriver,Python,Selenium Webdriver,我正在尝试从此网页(日历下面的框)获取可用时段的小时数: 我已经阅读了其他相关问题并编写了此代码 from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support.expected_conditions import presence_o
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup
url = 'https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/'
wait_time = 10
options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)
driver.get(url)
driver.switch_to.frame(0)
wait = WebDriverWait(driver, wait_time)
first_result = wait.until(presence_of_element_located((By.ID, "sb_main")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)
driver.quit()
切换到包含时隙的iframe后,我从打印soup
<script id="time_slots_view" type="text/html"><div class="slots-view{{#ifCond (getThemeOption 'timeline_modern_display') '==' 'as_table'}} as-table{{/ifCond}}">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
{{_t 'available_services_on_this_day'}}
</div>
{{#if error_message}}
<div class="alert alert-danger alert-dismissible" role="alert">
{{error_message}}
</div>
{{/if}}
{{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}}
<div id="sb_time_slots_container"></div>
{{> bookingTimeLegendPart legend="only_available" time_diff=0}}
</div>
</div>
</div></script>
<script id="time_slot_view" type="text/html"><div class="slot">
<a class="sb-cell free {{#ifPluginActive 'slots_count'}}{{#if available_slots}}has-available-slot{{/if}}{{/ifPluginActive}}" href="#{{bookingStepUrl time=time date=date}}">
{{formatDateTime datetime 'time' time_diff}}
{{#ifCond (getThemeOption 'timeline_show_end_time') '==' 1}}
-<span class="end-time">
{{formatDateTime end_datetime 'time' time_diff}}
</span>
{{/ifCond}}
{{#ifPluginActive 'slots_count'}}
{{#if available_slots}}
<span class="slot--available-slot">
{{available_slots}}
{{#ifConfigParam 'slots_count_show_total' '==' true}} / {{total_slots}} {{/ifConfigParam}}
</span>
{{/if}}
{{/ifPluginActive}}
</a>
</div></script>
{{{{t'今天可用的服务}
{{{if error_message}
{{error_message}}
{{/if}
{{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}
{{>bookingTimeLegendPart legend=“仅可用”时间差=0}
而从右键点击>检查网页中的元素,我得到了这个
<div class="slots-view">
<div class="timeline-wrapper">
<div class="tab-pd">
<div class="container-caption">
Orari d'inizio disponibili
</div>
<div id="sb_time_slots_container">
<div class="slot">
<a class="sb-cell free " href="#book/location/4/service/6/count/1/provider/6/date/2020-03-09/time/23:00:00/">
23:00
</a>
</div>
</div>
<div class="time-legend">
<div class="available">
<div class="circle">
</div>
- Disponibile
</div>
</div>
</div>
</div>
</div>
有争议的Orari d'Inzizio disponibili
-争议的
如何使用selenium获取可用插槽的小时数(本例中为23:00)?要获得所需的响应,您需要:
iframe
(并切换到它)。您试图切换到帧[0]
,但需要帧[1]
。下面的代码消除了对索引的依赖,而是使用xpath
xpath
标识具有id=sb\u time\u slots\u container
的元素的所有子div
div
,并获得文本属性,该属性嵌套在这些div
的
中wait.until
,以便可以加载内容
...
driver.get(url)
wait = WebDriverWait(driver, wait_time)
# Wait until the iframe exists then switch to it
iframe_element = wait.until(presence_of_element_located((By.XPATH, '//*[@id="prenota"]//iframe')))
driver.switch_to.frame(iframe_element)
# Wait until the times exist then get an array of them
wait.until(presence_of_element_located((By.XPATH, '//*[@id="sb_time_slots_container"]/div')))
all_time_elems = driver.find_elements_by_xpath('//*[@id="sb_time_slots_container"]/div')
# Iterate over each element and print the time out
for elem in all_time_elems:
print(elem.find_element_by_tag_name("a").text)
driver.quit()
这就绕过了一个事实:它都是作为脚本拉取的?是的,脚本正在向呈现的页面中注入html。
等待。
确保内容已加载,然后再尝试读取。您可以为selenium.common.exceptions.TimeoutException添加try/catch,以处理在指定时间(即OPs代码中的10秒)内未找到内容的情况。谢谢。我想这不可能仅仅使用请求?谢谢,这很有效!如果一个
内部有多个
呢?@Alistair,不,等待是至关重要的。如果您只得到一个元素,等待.until可以返回它。如果您要获取一个元素数组,则需要使用等待。直到然后通过…
查找元素。上述代码中的示例用法是获取帧(单个)与获取时间div(多个)