Python 基于动态内容和隐藏数据表的Selenium Web抓取_Python_Selenium_Dynamic_Web Scraping_Beautifulsoup

Python 基于动态内容和隐藏数据表的Selenium Web抓取

python selenium dynamic web-scraping

Python 基于动态内容和隐藏数据表的Selenium Web抓取,python,selenium,dynamic,web-scraping,beautifulsoup,Python,Selenium,Dynamic,Web Scraping,Beautifulsoup,真的需要这个社区的帮助我使用Selenium和Beauty Soup在Python中对动态内容进行web抓取。问题是定价数据表无法解析为Python，即使使用以下代码： html=browser.execute_script('return document.body.innerHTML') sel_soup=BeautifulSoup(html, 'html.parser') 然而，我后来发现，如果在使用上述代码之前单击网页上的“查看所有价格”按钮，我可以将该数据表解析为python

真的需要这个社区的帮助

我使用Selenium和Beauty Soup在Python中对动态内容进行web抓取。问题是定价数据表无法解析为Python，即使使用以下代码：

html=browser.execute_script('return document.body.innerHTML')
sel_soup=BeautifulSoup(html, 'html.parser')

然而，我后来发现，如果在使用上述代码之前单击网页上的“查看所有价格”按钮，我可以将该数据表解析为python

我的问题是，我如何在python中解析和访问这些隐藏的动态td标签信息，而不使用Selenium来单击所有“查看所有价格”按钮，因为有太多的按钮

我正在抓取的网站的url是，所附图片是我需要的动态数据表的html。

非常感谢这个社区的帮助

您应该在加载后以元素为目标，并通过

文档获取参数[0]
而不是整个页面
html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')

这有两个实际案例：
1.
该元素尚未加载到DOM中，您需要等待该元素：
browser.get("url")
sleep(experimental) # usually get will finish only after the page is loaded but sometimes there is some JS woo running after on load time

try:
    element= WebDriverWait(browser, delay).until(EC.presence_of_element_located((By.ID, 'your_id_of_interest')))
    print "element is ready do the thing!"
    html_of_interest=driver.execute_script('return arguments[0].innerHTML',element)
    sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
except TimeoutException:
    print "Somethings wrong!"   

2.
元素位于阴影根中，您需要首先扩展阴影根，可能不是您的情况，但我将在这里提到它，因为它与将来的参考相关。例：
import selenium
from selenium import webdriver
driver = webdriver.Chrome()
from bs4 import BeautifulSoup


def expand_shadow_element(element):
  shadow_root = driver.execute_script('return arguments[0].shadowRoot', element)
  return shadow_root

driver.get("chrome://settings")
root1 = driver.find_element_by_tag_name('settings-ui')

html_of_interest=driver.execute_script('return arguments[0].innerHTML',root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
sel_soup# empty root not expande

shadow_root1 = expand_shadow_element(root1)

html_of_interest=driver.execute_script('return arguments[0].innerHTML',shadow_root1)
sel_soup=BeautifulSoup(html_of_interest, 'html.parser')
sel_soup

非常感谢您的详细解释！我会尽快试一试。再次感谢你！