使用Selenium（Python）在网站中删除所有工具提示？_Python_Html_Selenium_Web Scraping_Beautifulsoup

使用Selenium（Python）在网站中删除所有工具提示？

python html selenium web-scraping

使用Selenium（Python）在网站中删除所有工具提示？,python,html,selenium,web-scraping,beautifulsoup,Python,Html,Selenium,Web Scraping,Beautifulsoup,我目前正试图刮这个网站我想在所有单独的工具提示中删除文本下面是我必须悬停的典型元素的html的样子 <div event_id="55591" class="dhx_cal_event_line past_event" style="position:absolute; top:2px; height: 42px; left:1px; width:750px;"><div> <div class="dhtmlXTooltip tooltip" style=

我目前正试图刮这个网站

我想在所有单独的工具提示中删除文本

下面是我必须悬停的典型元素的html的样子

<div event_id="55591" class="dhx_cal_event_line past_event" style="position:absolute; top:2px; height: 42px; left:1px; width:750px;"><div>

<div class="dhtmlXTooltip tooltip" style="visibility: visible; left: 803px; bottom:74px;

我还尝试了使用“dhtmlXTooltip tooltip”而不是“dhx\u cal\u event\u line past\u event”的相同代码

我真的不明白为什么

tool_tips=driver.find_elements_by_class_name("dhx_cal_event_line past_event")

不起作用

可以用Beautifulsoup来解决这个问题吗？由于html是动态且不断变化的？

如果您在Chrome DevTools中打开网络选项卡并按XHR进行过滤，您可以看到网站向

http://schedule.townsville-port.com.au/spotschedule.php

from bs4 import BeautifulSoup
import requests

url = 'http://schedule.townsville-port.com.au/spotschedule.php'
r = requests.get(url, verify=False)
soup = BeautifulSoup(r.text, 'xml')

transports = {}
events = soup.find_all('event')

for e in events:
    transport_id = e['id']
    transport = {child.name: child.text for child in e.children}
    transports[transport_id] = transport

import pprint
pprint.pprint(transports)

输出：

{'48165': {'IMO': '8201480',
       'app_rec': 'Approved',
       'cargo': 'Passenger Vessel (Import)',
       'details': 'Inchcape Shipping Services Pty Limited',
       'duration': '8',
       'end_date': '2018-02-17 14:03:00.000',
       'sectionID': '10',
       'start_date': '2018-02-17 06:44:00.000',
       'text': 'ARTANIA',
       'visit_id': '19109'},
 ...
}

我发现摆脱

SSLError

的唯一方法是使用

verify=False

禁用证书验证，您可以阅读更多信息

请注意，

start_date

和

end_date

是UTC时间，因此您可以指定

timeshift

查询参数：

import time

utc_offset = -time.localtime().tm_gmtoff // 60  # in minutes    
url = f'http://schedule.townsville-port.com.au/spotschedule.php?timeshift={utc_offset}'

或者转换日期并将其存储为

datetime

对象（您可以阅读有关将时间从UTC转换为本地时区的内容）。

您需要实现<代码>按类名称查找元素（“dhx\U cal\U event\U line past\U event”）不起作用，因为不允许使用复合类名称。另外，正确的CSS选择器是

find_elements_by_CSS_选择器（“.dhx_cal_event_line.pass_event”）

如果atall考虑

Beautifulsoup

为什么不标记

Beautifulsoup

，但是你已经标记了

Selenium

谢谢，我现在已经标记了它。@Andersson阅读了ActionChains的文档，似乎我仍然需要找到元素，我如何才能找到元素？我尝试通过_css_选择器（“.dhx_cal_event_line.pass_event”）查找_元素，但也找不到元素。我没有得到“no-some-element”异常。这是因为这些元素是动态生成的，所以您还需要实现

import time

utc_offset = -time.localtime().tm_gmtoff // 60  # in minutes    
url = f'http://schedule.townsville-port.com.au/spotschedule.php?timeshift={utc_offset}'