Python 不在一个页面上刮除一个CSS列表之外的所有请求数据
我正在尝试刮取一个网页,但是尽管在Chrome inspect中提供了正确的CSS,Selenium并没有刮取所有的数据,它只刮取了第一个页面上的数据,如下所示,然后给出了一条错误消息 我已经重新测试了CSS,并对其进行了多次更改,但是Selenium Python似乎没有正确地刮取数据 我也倾向于:Python 不在一个页面上刮除一个CSS列表之外的所有请求数据,python,css,selenium,xpath,web-scraping,Python,Css,Selenium,Xpath,Web Scraping,我正在尝试刮取一个网页,但是尽管在Chrome inspect中提供了正确的CSS,Selenium并没有刮取所有的数据,它只刮取了第一个页面上的数据,如下所示,然后给出了一条错误消息 我已经重新测试了CSS,并对其进行了多次更改,但是Selenium Python似乎没有正确地刮取数据 我也倾向于: Traceback (most recent call last): File "C:/Users/Bain3/PycharmProjects/untitled4/Vpalmerbet1.py
Traceback (most recent call last):
File "C:/Users/Bain3/PycharmProjects/untitled4/Vpalmerbet1.py", line 1365, in <module>
EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
File "C:\Users\Bain3\Anaconda3\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
您可以看到chrome Inspect检测到这个CSS
我的全部代码是:
from selenium import webdriver
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
try:
os.remove('vtg121.csv')
except OSError:
pass
driver.get('https://www.palmerbet.com/sports/soccer')
#SCROLL_PAUSE_TIME = 0.5
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
#clickMe = wait(driver, 3).until(EC.element_to_be_clickable((By.XPATH, ('//*[@id="TopPromotionBetNow"]'))))
#if driver.find_element_by_css_selector('#TopPromotionBetNow'):
#driver.find_element_by_css_selector('#TopPromotionBetNow').click()
#last_height = driver.execute_script("return document.body.scrollHeight")
#while True:
#driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
#time.sleep(SCROLL_PAUSE_TIME)
#new_height = driver.execute_script("return document.body.scrollHeight")
#if new_height == last_height:
#break
#last_height = new_height
time.sleep(1)
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, ('//*[contains(@class,"filter_labe")]'))))
clickMe.click()
time.sleep(0)
clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//*[contains(@class,"filter_labe")])')))
options = driver.find_elements_by_xpath('//*[contains(@class,"filter_labe")]')
indexes = [index for index in range(len(options))]
shuffle(indexes)
for index in indexes:
time.sleep(0)
#driver.get('https://www.bet365.com.au/#/AS/B1/')
clickMe1 = wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH,'(//ul[@id="tournaments"]//li//input)[%s]' % str(index + 1))))
clickMe1.click()
time.sleep(0)
##tournaments > li > input
#//*[@id='tournaments']//li//input
# Team
#clickMe = wait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,("#mta_row td:nth-child(1)"))))
langs3 = driver.find_elements_by_css_selector("#mta_row td:nth-child(1)")
langs3_text = []
for lang in langs3:
print(lang.text)
langs3_text.append(lang.text)
time.sleep(0)
# Team ODDS
langs = driver.find_elements_by_css_selector("#mta_row .mpm_teams_cell_click:nth-child(2) .mpm_teams_bet_val")
langs_text = []
for lang in langs:
print(lang.text)
langs_text.append(lang.text)
time.sleep(0)
# HREF
#langs2 = driver.find_elements_by_xpath("//ul[@class='runners']//li[1]")
#a[href*="/sports/soccer/"]
#url1 = driver.current_url
#clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ('.match-pop-market a[href*="/sports/soccer/"]'))))
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, ("//*[@class='match-pop-market']//a[href*='/sports/soccer/']"))))
elems = driver.find_elements_by_css_selector('.match-pop-market a[href*="/sports/soccer/"]')
elem_href = []
for elem in elems:
print(elem.get_attribute("href"))
elem_href.append(elem.get_attribute("href"))
print(("NEW LINE BREAK"))
import sys
import io
with open('vtg121.csv', 'a', newline='', encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in zip(langs_text, langs3_text, elem_href):
writer.writerow(row)
print(row)
您的XPath不正确。请注意,类似[href*=/sports/soccer/]的谓词可以在CSS选择器中使用,而在XPath中则应该使用[contains@href,/体育/足球/]。所以完整的线路应该是
from selenium.common.exceptions import TimeoutException
try:
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='match-pop-market']//a[contains(@href, '/sports/soccer/')]")))
clickMe1.click()
except TimeoutException:
print("No link was found")
你必须使用这样的图像。每篇文章只问一个问题,特别是如果它们不相关的话。尽量避免使用像eBage、here等通用词。要更具体。“人们可能会回避你的问题,如果他们必须访问几个网站来实现这个目标的话。”萨曼瓦赫德说得很好。我张贴了我的code@HaydenDarcy请上传图片,因为依赖外部服务是一种不好的做法,因为您的图片将被删除,帖子将无法用于将来的参考。@Loïc很好,我删除了所有外部链接,并将所有相关信息放在了问题中。是的,我将其调整为://div[contains@class,“匹配流行音乐市场”]//a[contains@href,“/体育/足球/”]。这在检查中起作用。错误仍然存在。奇怪。完全删除该行似乎会产生更好的结果。我假设在selenium找不到创建错误的页面中存在HRef,那么您可以实施try/except块,该块工作得非常好。我确实发现奇怪的是,所有选择器看起来都正确,但它会刮去一些页面es而不是其他。[IMAGE]+代码。我假设它试图从未正确显示的元素中进行刮取。但这并不能解释为什么Href总是被刮取。
from selenium.common.exceptions import TimeoutException
try:
clickMe = wait(driver, 15).until(EC.element_to_be_clickable((By.XPATH, "//*[@class='match-pop-market']//a[contains(@href, '/sports/soccer/')]")))
clickMe1.click()
except TimeoutException:
print("No link was found")