Python 如何提取播放器';是否根据HTML从统计页面获取信息?
我正试图为一个使用selenium的网站搜集一些信息,下面是该网站的链接 我试图获取的信息在玩家的“统计信息”下面。我的代码现在打开玩家的个人资料,然后打开玩家的统计信息页面。我试图找到一种方法来提取玩家统计信息页面中的信息,下面是我的代码Python 如何提取播放器';是否根据HTML从统计页面获取信息?,python,selenium,selenium-webdriver,web-scraping,webdriverwait,Python,Selenium,Selenium Webdriver,Web Scraping,Webdriverwait,我正试图为一个使用selenium的网站搜集一些信息,下面是该网站的链接 我试图获取的信息在玩家的“统计信息”下面。我的代码现在打开玩家的个人资料,然后打开玩家的统计信息页面。我试图找到一种方法来提取玩家统计信息页面中的信息,下面是我的代码 from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome() driver.get("http://www.ultimatetennisst
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
soup = BeautifulSoup(driver.page_source,"lxml")
try:
dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
dropdown.click()
bm = driver.find_element_by_id('statisticsPill')
bm.click()
for i in soup.select('#statistics table.table tr'):
print(i)
data1 = [x.get_text(strip=True) for x in i.select("th,td")]
print(data1)
except ValueError:
print("error")
我
服侍
<th class="pct-data text-right"><i class="fa fa-percent"></i></th>
<th class="raw-data text-right" style="display: none;"><i class="fa fa-hashtag"></i></th>
</tr>
</thead>
<tbody>
<tr>
<td>Ace %</td>
<th class="text-right pct-data">23.4%</th>
<th class="raw-data text-right" style="display: none;">12942 / 55377</th>
</tr>
<tr>
<td>Double Fault %</td>
<th class="text-right pct-data">4.2%</th>
<th class="raw-data text-right" style="display:
Ace%
23.4%
12942/55377
双故障%
4.2%
要从统计信息页面提取玩家的信息,可以使用以下解决方案:
- 代码块:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[@id='playerPills']//a[@class='dropdown-toggle'][normalize-space()='Statistics']"))).click()
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//ul[@class='dropdown-menu']//a[@id='statisticsPill'][normalize-space()='Statistics']"))).click()
statistics_items = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//table[@class='table table-condensed table-hover table-striped']//tbody//tr/td")))
statistics_value = WebDriverWait(driver, 10).until(EC.visibility_of_any_elements_located((By.XPATH, "//table[@class='table table-condensed table-hover table-striped']//tbody//tr//following::th[1]")))
for item, value in zip(statistics_items, statistics_value):
print('{} {}'.format(item.text, value.text))
- 控制台输出:
Ace % 4.0%
Double Fault % 2.1%
1st Serve % 68.7%
1st Serve Won % 71.8%
2nd Serve Won % 57.3%
Break Points Saved % 66.3%
Service Points Won % 67.2%
Service Games Won % 85.6%
Ace Against % Return
Double Fault Against % 7.2%
1st Srv. Return Won % 3.4%
2nd Srv. Return Won % 34.2%
Break Points Won % 55.3%
Return Points Won % 44.9%
Return Games Won % 42.4%
Points Dominance 33.3%
Games Dominance Total
Break Points Ratio 1.29
Total Points Won % 2.31
Games Won % 1.33
Sets Won % 54.4%
Matches Won % 59.7%
Match Time 77.2%
问题在于该线路的位置-
soup = BeautifulSoup(driver.page_source,"lxml")
它应该在您单击“统计”选项卡之后出现。因为只有表加载和soup才能解析它
最终代码-
from selenium import webdriver
from bs4 import BeautifulSoup
import time
driver = webdriver.Chrome(executable_path=r'//path/chromedriver.exe')
driver.get("http://www.ultimatetennisstatistics.com/playerProfile?playerId=4742")
try:
dropdown = driver.find_element_by_xpath('//*[@id="playerPills"]/li[9]/a')
dropdown.click()
bm = driver.find_element_by_id('statisticsPill')
bm.click()
driver.maximize_window()
soup = BeautifulSoup(driver.page_source,"lxml")
for i in soup.select('#statisticsOverview table tr'):
print(i.text)
data1 = [x.get_text(strip=True) for x in i.select("th,td")]
print(data1)
except ValueError:
print("error")
要提取信息
哪些信息?@DebanjanB抱歉,我编辑了这个问题感谢您的帮助,这似乎很有效。您能推荐我可以了解更多有关selenium的地方吗?@smith选择任何与selenium教程相关的网站,让您的双手沾满代码。当你有疑问时,请随时提出问题。试着抓住你遇到的每一件事的核心。你很快就会成为专家!!!我注意到,如果你不最大化窗口,你的代码将无法工作,为什么这只是好奇?