Python 当我练习爬行时,会收到以下错误消息';selenium.common.exceptions.TimeoutException:Message:';
这是我的爬行练习代码Python 当我练习爬行时,会收到以下错误消息';selenium.common.exceptions.TimeoutException:Message:';,python,selenium,selenium-webdriver,Python,Selenium,Selenium Webdriver,这是我的爬行练习代码 from selenium import webdriver import time from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome
from selenium import webdriver
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = webdriver.Chrome('C:/chromedriver_win32/chromedriver')
browser.implicitly_wait(5)
browser.set_window_size(1024, 768) # maximize_window(), minimize
browser.get('http://prod.danawa.com/list/?cate=112758&15main_11_02')
WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH,'//*[@id="dlMaker_simple"]/dd/div[2]/button[1]'))).click()
WebDriverWait(browser, 3).until(EC.presence_of_element_located((By.XPATH,'//*[@id="selectMaker_simple_priceCompare_A"]/li[15]/label'))).click()
time.sleep(2)
# current page
cur_page = 1
# crawling page all
target_crawl_num = 7
while cur_page <= target_crawl_num:
soup = BeautifulSoup(browser.page_source, 'html.parser')
# selecting main product list
pro_list = soup.select('div.main_prodlist.main_prodlist_list > ul.product_list > li')
# checkig product list
# print(pro_list)
# current page print
print('****** Current Page : {}'.format(cur_page), '******')
print()
for v in pro_list:
if not v.find('div', class_ = "ad_header"):
print(v.select('p.prod_name > a')[0].text.strip())
# print(v.select('a.thumb_link > img')[0]['src']) << if I using this code, I get error message 'indexError: list index out of range' why??
print(v.select('p.price_sect > a')[0].text.strip())
print()
print()
cur_page += 1
if cur_page > target_crawl_num:
print('Crawling Succeed')
break
del soup
# next page clike
# XPATH
WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH,'//*[@id="productListArea"]/div[5]/div/div/a[{}]'.format(cur_page)))).click()
# CSS_SELECTOR
# WebDriverWait(browser, 3).until(EC.presence_of_element_located((By.CSS_SELECTOR,'div.number_warp > a:nth-child[{}]'.format(cur_page)))).click()
# wait 3sec
time.sleep(3)
# close browser
browser.close()
从selenium导入webdriver
导入时间
从selenium.webdriver.common.by导入
从selenium.webdriver.support.ui导入WebDriverWait
从selenium.webdriver.support将预期的_条件导入为EC
从selenium.webdriver.chrome.options导入选项
从bs4导入BeautifulSoup
chrome_options=options()
chrome\u选项。添加\u参数(“--headless”)
browser=webdriver.Chrome('C:/chromedriver\u win32/chromedriver')
浏览器。隐式等待(5)
浏览器。设置窗口大小(1024768)#最大化窗口(),最小化
browser.get('http://prod.danawa.com/list/?cate=112758&15main_11_02')
WebDriverWait(浏览器,5)。直到(EC.presence_of_元素位于((By.XPATH,'/*[@id=“dlMaker_simple”]/dd/div[2]/button[1])。单击()
WebDriverWait(浏览器,3)。直到(例如,元素的存在位置((By.XPATH,'/*[@id=“selectMaker\u simple\u priceCompare\u A”]/li[15]/label')。单击()
时间。睡眠(2)
#当前页
当前页面=1
#爬行页面全部
目标爬网数=7
当前页面目标爬网数量:
打印('爬网成功')
打破
德尔汤
#下一页陈词滥调
#XPATH
WebDriverWait(浏览器,5)。直到(EC.presence_of_element_located((By.XPATH,'/*[@id=“productListArea”]/div[5]/div/div/a[{}]'。格式(cur page)))。单击()
#CSS_选择器
#WebDriverWait(浏览器,3)。直到(EC.presence_of_element_located((通过.CSS_选择器,'div.number_warp>a:nth child[{}]'。格式(cur_页面)))。单击()
#等3秒
时间。睡眠(3)
#关闭浏览器
browser.close()
当我操作这段代码时,我从第1页成功到第2页。但是,当第2页结束时,我收到如下错误消息
Traceback (most recent call last):
File "C:\python_crawl\.vscode\section06-3.py", line 107, in <module>
WebDriverWait(browser, 5).until(EC.presence_of_element_located((By.XPATH,'//*[@id="productListArea"]/div[5]/div/div/a[{}]'.format(cur_page)))).click()
File "C:\python_crawl\lib\site-packages\selenium\webdriver\support\wait.py", line 80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
The terminal process "C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe -Command python C:\python_crawl\.vscode\section06-3.py" terminated with exit code: 1.
回溯(最近一次呼叫最后一次):
文件“C:\python\u crawl\.vscode\section06-3.py”,第107行,在
WebDriverWait(浏览器,5)。直到(EC.presence_of_element_located((By.XPATH,'/*[@id=“productListArea”]/div[5]/div/div/a[{}]'。格式(cur page)))。单击()
文件“C:\python\u crawl\lib\site packages\selenium\webdriver\support\wait.py”,第80行,直到
引发TimeoutException(消息、屏幕、堆栈跟踪)
selenium.common.Exception.TimeoutException:消息:
终端进程“C:\WINDOWS\System32\WindowsPowerShell\v1.0\powershell.exe-命令python C:\python\u crawl\.vscode\section06-3.py”以退出代码1终止。
有什么我可以做的吗?错误意味着Selenium正在等待您在那里指定的元素,但它找不到它。
可能您使用了错误的定位器,或者该元素位于另一个页面或iframe内部