Python 如何找到正确的xpath和循环表?
我想从上的“Elektricteit NL”表中获取所有值。然而,在使用selenium无休止地试图找到正确的xpath之后,我无法清理表 我尝试使用“inspect”并从表中复制xpath来标识表的长度,以便稍后进行刮取。失败后,我尝试使用“包含”,但也没有成功。后来,我尝试了一些东西,但没有任何运气Python 如何找到正确的xpath和循环表?,python,selenium,web-scraping,beautifulsoup,python-requests,Python,Selenium,Web Scraping,Beautifulsoup,Python Requests,我想从上的“Elektricteit NL”表中获取所有值。然而,在使用selenium无休止地试图找到正确的xpath之后,我无法清理表 我尝试使用“inspect”并从表中复制xpath来标识表的长度,以便稍后进行刮取。失败后,我尝试使用“包含”,但也没有成功。后来,我尝试了一些东西,但没有任何运气 #%% import pandas as pd from selenium import webdriver import pandas as pd #%% powerhouse Elektr
#%%
import pandas as pd
from selenium import webdriver
import pandas as pd
#%% powerhouse Elektriciteit NL base & peak
url = "https://powerhouse.net/forecast-prijzen-onbalans/"
#%% open webpagina
driver = webdriver.Chrome(executable_path = path + 'chromedriver.exe')
driver.get(url)
#%%
prices = []
#loop for values in table
for j in range(len(driver.find_elements_by_xpath('//tr[@id="endex_nl_forecast"]/div[3]/table/tbody/tr[1]/td[4]'))):
base = driver.find_elements_by_xpath('//tr[@id="endex_nl_forecast"]/div[3]/table/tbody/tr[1]/td[4]')[j]
#%%
#trying with BeautifulSoup
from bs4 import BeautifulSoup
import requests
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find('table', id = 'endex_nl_forecast')
rows = soup.find_all('tr')
我希望将表放在数据框架中,并了解xpath的具体工作原理。我对整个概念有点陌生。如果您对xpath以外的方法持开放态度,那么您可以在不使用selenium或xpath的情况下实现这一点: 你可以用熊猫
import pandas as pd
table = pd.read_html('https://powerhouse.net/forecast-prijzen-onbalans/')[4]
如果需要图标的文本表示,可以从相应的td
s中提取描述箭头方向的svg
类名
from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
r = requests.get('https://powerhouse.net/forecast-prijzen-onbalans/')
soup = bs(r.content, 'lxml')
table = soup.select_one('#endex_nl_forecast table')
rows = []
headers = [i.text for i in table.select('th')]
for tr in table.select('tr')[1:]:
rows.append([i.text if i.svg is None else i.svg['class'][2].split('-')[-1] for i in tr.select('td') ])
df = pd.DataFrame(rows, columns = headers)
print(df)
示例行:
您可以使用Selenium驱动程序查找表及其内容
url = 'https://powerhouse.net/forecast-prijzen-onbalans/'
driver.get(url)
time.sleep(3)
读取表格标题并打印
tableHeader = driver.find_elements_by_xpath("//*[@id='endex_nl_forecast']//thead//th")
print(tableHeader)
for header in tableHeader:
print(header.text)
查找表中的行数的步骤
rowElements = driver.find_elements_by_xpath("//*[@id='endex_nl_forecast']//tbody/tr")
print('Total rows in the table:', len(rowElements))
按原样打印每行
for row in rowElements:
print(row.text)
第二个解决方案很好。谢谢