使用SeleniumWebDriverPython获取页面源代码_Python_Selenium Webdriver

使用SeleniumWebDriverPython获取页面源代码

python selenium-webdriver

使用SeleniumWebDriverPython获取页面源代码,python,selenium-webdriver,Python,Selenium Webdriver,我正在废弃一些网站和它的工作动态。我要在一个网站的所有网页，同时我想在一个列表中的所有网页的所有网页源数据。这是我的代码，移动到所有页面并获取它们的页面源代码。但函数结束时没有打印或返回任何内容。我为其他网站做了这件事，但不是在这里。请帮我解决这个问题。多谢各位 def get_html(driver): output = [] keep_going = True while keep_going: # Pull page HTML

我正在废弃一些网站和它的工作动态。我要在一个网站的所有网页，同时我想在一个列表中的所有网页的所有网页源数据。这是我的代码，移动到所有页面并获取它们的页面源代码。但函数结束时没有打印或返回任何内容。我为其他网站做了这件事，但不是在这里。请帮我解决这个问题。多谢各位

def get_html(driver):
    output = []
    keep_going = True
    while keep_going:
        # Pull page HTML
        try:
            output.append(driver.page_source)
        except TimeoutException:
            pass
        try:
            # Check to see if a "next page" link exists
            keep_going = driver.find_element_by_class_name(
                'next ').is_displayed()
        except NoSuchElementException:
            keep_going = False
        if keep_going == True:
            try:
                driver.wait.until(EC.element_to_be_clickable(
                    (By.CLASS_NAME, 'next '))).click()
                time.sleep(3)
            except TimeoutException:
                keep_going = False
        else:
            keep_going = False
    print(str(len(output)))
    return (output)

raw_data = get_html(driver)
print(str(len(raw_data)) listing found")

这是我得到的错误输出

> Entering search term number 1 out of 1 Traceback (most recent call
> last):   File "E:/Harshitha/python learning/python/New/rough1.py",
> line 114, in <module>
>     raw_data = get_html(driver)   File "E:/Harshitha/python learning/python/New/rough1.py", line 65, in get_html
>     output = (driver.page_source).encode('utf-8')   File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py",
> line 670, in page_source
>     return self.execute(Command.GET_PAGE_SOURCE)['value']   File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\webdriver.py",
> line 312, in execute
>     self.error_handler.check_response(response)   File "C:\Users\Harshitha\Anaconda3\lib\site-packages\selenium\webdriver\remote\errorhandler.py",
> line 237, in check_response
>     raise exception_class(message, screen, stacktrace) selenium.common.exceptions.WebDriverException: Message: chrome not
> reachable   (Session info: chrome=63.0.3239.132)   (Driver info:
> chromedriver=2.34.522940
> (1a76f96f66e3ca7b8e57d503b4dd3bccfba87af1),platform=Windows NT
> 10.0.16299 x86_64)

我使用page_Source函数：

它可能是python代码的重复，并且由于“WebDriver”对象没有属性“get_source”Try driver而引发错误。get_source不带我更改python上的函数是page_source即使我只使用page_source，但结果不会从函数返回。我在5-6个网站上用过这个，它在3个网站上用过，但在其他网站上没有用过，我不知道为什么……也许这个编码到utf-8的问题是输出=驱动程序。页面\源代码。编码为“utf-8”。你能试试用unicode编码吗？

driver.page_source;