Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用Selenium和Python在try循环外处理错误_Python_Selenium_Beautifulsoup_Webdriverwait_Duckduckgo - Fatal编程技术网

如何使用Selenium和Python在try循环外处理错误

如何使用Selenium和Python在try循环外处理错误,python,selenium,beautifulsoup,webdriverwait,duckduckgo,Python,Selenium,Beautifulsoup,Webdriverwait,Duckduckgo,我想使用selenium运行搜索,然后单击DDG搜索末尾的“更多结果”按钮 DDG搜索在显示查询的所有结果时不再显示该按钮 如果没有按钮,我想退出try循环 我将分享我正在尝试的东西。我之前也尝试过这两个选项:If len(button\u element)>0:button\u element.click(),我尝试过If button\u element不是None:button\u element.click() 我希望该解决方案使用Selenium,以便显示浏览器,因为它有助于调试 这是

我想使用selenium运行搜索,然后单击DDG搜索末尾的“更多结果”按钮

DDG搜索在显示查询的所有结果时不再显示该按钮

如果没有按钮,我想退出try循环

我将分享我正在尝试的东西。我之前也尝试过这两个选项:
If len(button\u element)>0:button\u element.click()
,我尝试过
If button\u element不是None:button\u element.click()

我希望该解决方案使用Selenium,以便显示浏览器,因为它有助于调试

这是我的代码,带有一个可复制的示例:

    from selenium import webdriver
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.chrome.options import Options
    from bs4 import BeautifulSoup

    browser = webdriver.Chrome()        
    browser.get("https://duckduckgo.com/")
    search = browser.find_element_by_name('q')
    search.send_keys("this is a search" + Keys.RETURN)
    html = browser.page_source

    try:
        button_element = browser.find_element_by_class_name('result--more__btn')

        try:
            button_element.click()
        except SystemExit:
            print("No more pages")

    except:
        pass

使用WebDriverWait等待出现更多按钮

wait=WebDriverWait(浏览器,15)#15秒超时
wait.until(位于((By.CLASS\u NAME,'result--more\u btn'))的\u元素的预期\u条件.visibility\u)
此示例代码单击“更多”按钮,直到不再有更多按钮 对于chrome,用chrome替换firefox

从selenium导入webdriver
从selenium.webdriver.common.keys导入密钥
从selenium.webdriver.firefox.options导入选项
从selenium.webdriver.common.by导入
从selenium.webdriver.support.ui导入WebDriverWait
从selenium.webdriver.support导入预期的\u条件
browser=webdriver.Firefox()
browser.get(“https://duckduckgo.com/")
搜索=浏览器。按名称(“q”)查找元素
search.send_key(“这是搜索”+key.RETURN)
尽管如此:
尝试:
等待=WebDriverWait(浏览器,15)#15秒超时
wait.until(位于((By.CLASS\u NAME,'result--more\u btn'))的\u元素的预期\u条件.visibility\u)
button\u element=浏览器。通过类名称查找元素('result--more\u btn')
按钮\元素。单击()
除:
打破

您可以在URL
https://duckduckgo.com/html/?q=
。通过这种方式,您可以使用纯
请求
/
美化组
方法,轻松获取所有页面:

import requests
from bs4 import BeautifulSoup


q = '"centre of intelligence"'
url = 'https://duckduckgo.com/html/?q={q}'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}

soup = BeautifulSoup(requests.get(url.format(q=q), headers=headers).content, 'html.parser')

while True:
    for t, a, s in zip(soup.select('.result__title'), soup.select('.result__a'), soup.select('.result__snippet')):
        print(t.get_text(strip=True, separator=' '))
        print(a['href'])
        print(s.get_text(strip=True, separator=' '))
        print('-' * 80)

    f = soup.select_one('.nav-link form')
    if not f:
        break

    data = {}
    for i in f.select('input'):
        if i['type']=='submit':
            continue
        data[i['name']] = i.get('value', '')

    soup = BeautifulSoup(requests.post('https://duckduckgo.com' + f['action'], data=data, headers=headers).content, 'html.parser')
印刷品:

Centre Of Intelligence - Home | Facebook
https://www.facebook.com/Centre-Of-Intelligence-937637846300833/
Centre Of Intelligence . 73 likes. Non-profit organisation. Facebook is showing information to help you better understand the purpose of a Page.
--------------------------------------------------------------------------------
centre of intelligence | English examples in context | Ludwig
https://ludwig.guru/s/centre+of+intelligence
(Glasgow was "the centre of the intelligence of England" according to the Grand Duke Alexis, who attended the launch of his father Tsar Alexander II's steam yacht there in 1880).
--------------------------------------------------------------------------------
Chinese scientists who studied bats in Aus at centre of intelligence ...
https://www.youtube.com/watch?v=UhcFXXzf2hc
Intelligence agencies are looking into two Chinese scientists in a bid to learn the true origin of COVID-19. Two Chinese scientists who studied live bats in...
--------------------------------------------------------------------------------

... and so on.
要使用单击搜索结果结尾处的“更多结果”按钮,您必须使
元素可单击()
,并且您可以使用以下任一选项:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.keys import Keys
    from selenium.common.exceptions import TimeoutException
    
    options = webdriver.ChromeOptions() 
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get('https://duckduckgo.com/')
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.NAME, "q"))).send_keys("this is a search" + Keys.RETURN)
    while True:
          try:
              WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a.result--more__btn"))).click()
              print("Clicked on More Results button")
          except TimeoutException:
              print("No more More Results button")
              break
    driver.quit()
    
  • 控制台输出:

    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    Clicked on More Results button
    No more More Results button
    
您可以在中找到相关的讨论


我的程序一直在等待新的“更多”按钮,因此它会单击“更多”按钮,直到在15秒内找不到更多按钮,并且它会无限期地执行此操作。因此,您必须等待超时,因为可能会生成一个按钮。但是我认为您可以将超时时间从15秒降低到3秒。您不能在codeshare中运行代码,只需将其复制到您的编辑器中即可。这对我来说非常适合。我如何使此显示浏览器与原始代码一样?编辑了问题,以包括解决方案显示browser@tadon11AaaBeautifulSoup根本不使用浏览器。要调试页面,可以执行
print(soup)
print(soup.prettify())
。您可以将此输出重定向到文件,然后在浏览器中手动打开它。谢谢!我将编辑我的答案,以指定我要使用的答案selenium@tadon11Aaa我没有使用Selenium的经验,但使用URL
https://duckduckgo.com/html/?q=
。它不使用javascript,所以我想导航会更容易。太棒了,这正是我想要的。我可以用浏览器使用它吗?get?@tadon11Aaa
driver
是一个类型为
WebDriver
的变量,你可以给它命名任何名称,例如
browser
browser1
browser2
,等等,你能把你的导入添加到你的答案中吗?这将导致NameError:名称“TimeoutException”不正确defined@tadon11Aaa我的错,我也应该加上进口的。现在更新。