Python如何在pre标记下查找数据_Python_Html_Selenium_Beautifulsoup_Frames

Python如何在pre标记下查找数据

python html selenium

Python如何在pre标记下查找数据,python,html,selenium,beautifulsoup,frames,Python,Html,Selenium,Beautifulsoup,Frames,我想使用Python从html页面的pre标记下获取一些数据我首先尝试使用Selenium，但它无法通过xpath找到元素 browser = webdriver.Ie() wait = WebDriverWait(browser, 5) browser.get('file:\\\my_url.html') body= wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre[2]"))) print(body.

我想使用Python从html页面的pre标记下获取一些数据

我首先尝试使用Selenium，但它无法通过xpath找到元素

browser = webdriver.Ie()
wait = WebDriverWait(browser, 5)
browser.get('file:\\\my_url.html')
body= wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre[2]")))
print(body.text)

我试着使用bs4。然而，BeautifulSoup一直告诉我，我的浏览器不支持框架扩展。我不熟悉bs4，无法找到任何有用的解决方案。谁能告诉我如何修改IE浏览器的设置以成功读取数据？谢谢

import urllib.request
from bs4 import BeautifulSoup
from urllib.request import urlopen
import html2text

url = " " #this html page is on a network drive and can be opened by IE\Chrome\...
html = urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")

for script in soup(["script", "style"]):
    script.extract()    # rip it out

text = soup.get_text()
lines = (line.strip() for line in text.splitlines())
chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
text = '\n'.join(chunk for chunk in chunks if chunk)

print(text)

>>>This page is designed to be viewed by a browser which supports Frames extension. 
This text will be shown by browsers which do not support the Frames extension.

您的

pre

元素位于名为“glhstry\u main”的

中，因此在访问元素之前，您需要先切换到它。在这里：

browser = webdriver.Ie()
wait = WebDriverWait(browser, 5)
browser.get('file:\\\my_url.html')
browser.switch_to_frame("glhstry_main")  // switching to the frame
body= wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre[2]")))
print(body.text)
//do your frame stuff
driver.switch_to.default_content()     // switching back to original HTML from the frame

您正试图从哪些页面获取数据。至少提供页面的html结构。框架集下存储了许多数据部分。请不要使用图像。关于代码，您的文档在这里是无用的。