Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/sql/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python BeautifulSoup不';在网页上找不到表格_Python_Selenium_Iframe_Beautifulsoup_Webdriverwait - Fatal编程技术网

Python BeautifulSoup不';在网页上找不到表格

Python BeautifulSoup不';在网页上找不到表格,python,selenium,iframe,beautifulsoup,webdriverwait,Python,Selenium,Iframe,Beautifulsoup,Webdriverwait,我试图从网站上的第一个表中获取数据。我在这里查看了类似的问题,并尝试了一些给定的解决方案,但似乎找不到表,最终也找不到表中的数据 我试过了: from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome('C:\\folder\\chromedriver.exe') url = 'https://docs.microsoft.com/en-us/windows/releas

我试图从网站上的第一个表中获取数据。我在这里查看了类似的问题,并尝试了一些给定的解决方案,但似乎找不到表,最终也找不到表中的数据

我试过了:

from bs4 import BeautifulSoup  
from selenium import webdriver  
driver = webdriver.Chrome('C:\\folder\\chromedriver.exe')  
url = 'https://docs.microsoft.com/en-us/windows/release-information/'  
driver.get(url)  

tbla = driver.find_element_by_name('table') #attempt using by element name  
tblb = driver.find_element_by_class_name('cells-centered') #attempt using by class name  
tblc = driver.find_element_by_xpath('//*[@id="winrelinfo_container"]/table[1]') #attempt by using xpath  
并尝试使用漂亮的汤

html = driver.page_source
soup = BeautifulSoup(html,'html.parser')
table = soup.find("table", {"class": "cells-centered"})
print(len(table))

非常感谢您的帮助。

表格位于
iframe
中。您需要先切换
iframe
才能访问
表格

诱导
WebDriverWait()
并等待
frame\u可用,然后切换到\u it
()并跟随定位器

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))
导入
WebDriverWait()
,并等待位于
()的元素和以下定位器的可见性

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))
您需要导入以下库

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

或者将下面的代码与xpath一起使用

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))

您可以进一步将表数据导入pandas数据框,然后导出到csv文件。您需要导入pandas

driver.get("https://docs.microsoft.com/en-us/windows/release-information/")
WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))
table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')
df=pd.read_html(str(table))[0]
print(df)
df.to_csv("path/to/csv")
导入熊猫:
pip安装熊猫

然后添加下面的库

import pandas as pd

该表位于
中,因此
BeautifulSoup
在原始页面中看不到它:

import requests 
from bs4 import BeautifulSoup


url = 'https://docs.microsoft.com/en-us/windows/release-information/'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
soup = BeautifulSoup(requests.get(soup.select_one('iframe')['src']).content, 'html.parser')

for row in soup.select('table tr'):
    print(row.get_text(strip=True, separator='\t'))
印刷品:

Version Servicing option    Availability date   OS build    Latest revision date    End of service: Home, Pro, Pro Education, Pro for Workstations and IoT Core End of service: Enterprise, Education and IoT Enterprise
2004    Semi-Annual Channel 2020-05-27  19041.546   2020-10-01  2021-12-14  2021-12-14  Microsoft recommends
1909    Semi-Annual Channel 2019-11-12  18363.1110  2020-09-16  2021-05-11  2022-05-10
1903    Semi-Annual Channel 2019-05-21  18362.1110  2020-09-16  2020-12-08  2020-12-08
1809    Semi-Annual Channel 2019-03-28  17763.1490  2020-09-16  2020-11-10  2021-05-11
1809    Semi-Annual Channel (Targeted)  2018-11-13  17763.1490  2020-09-16  2020-11-10  2021-05-11
1803    Semi-Annual Channel 2018-07-10  17134.1726  2020-09-08  End of service  2021-05-11

...and so on.

啊,我的框架。。。还有很多东西需要学习:-),这正是我需要的,我的下一步是创建一个df,将数据导入SQL数据库,所以这非常完美,非常感谢@KunduK,非常感谢!!这也解决了问题,谢谢你的帮助,非常感谢