Web scraping python3.7-PhantomJS-Driver.get（url）和'；窗口句柄/名称无效或已关闭？'；_Web Scraping_Beautifulsoup_Phantomjs_Python 3.7

Web scraping python3.7-PhantomJS-Driver.get（url）和'；窗口句柄/名称无效或已关闭？'；

web-scraping phantomjs

Web scraping python3.7-PhantomJS-Driver.get（url）和'；窗口句柄/名称无效或已关闭？'；,web-scraping,beautifulsoup,phantomjs,python-3.7,Web Scraping,Beautifulsoup,Phantomjs,Python 3.7,使用两个函数刮取网站会导致driver.get错误我尝试了while和for循环的不同变体来实现这一点。现在我得到了一个驱动程序。初始函数独立工作，但是当一个接一个地运行这两个函数时，我得到了这个错误 import requests, sys, webbrowser, bs4, time import urllib.request import pandas as pd from selenium import webdriver driver = webdriver.PhantomJS(ex

使用两个函数刮取网站会导致driver.get错误

我尝试了while和for循环的不同变体来实现这一点。现在我得到了一个驱动程序。初始函数独立工作，但是当一个接一个地运行这两个函数时，我得到了这个错误

import requests, sys, webbrowser, bs4, time
import urllib.request
import pandas as pd
from selenium import webdriver
driver = webdriver.PhantomJS(executable_path = 'C:\\PhantomJS\\bin\\phantomjs.exe')
jobtit = 'some+job'
location = 'some+city'
urlpag = ('https://www.indeed.com/jobs?q=' + jobtit + '&l=' + location + '%2C+CA')



def initial_scrape():
    data = []
    try:
        driver.get(urlpag)
        results = driver.find_elements_by_tag_name('h2')
        print('Finding the results for the first page of the search.')
        for result in results: # loop 2
            job_name = result.text
            link = result.find_element_by_tag_name('a')
            job_link = link.get_attribute('href')
            data.append({'Job' : job_name, 'link' : job_link})
            print('Appending the first page results to the data table.')
            if result == len(results):
                return
    except Exception:
        print('An error has occurred when trying to run this script.  Please see the attached error message and screenshot.')
        driver.save_screenshot('screenshot.png')
        driver.close()
    return data


def second_scrape():
    data = []
    try:
        #driver.get(urlpag)
        pages = driver.find_element_by_class_name('pagination')
        print('Variable nxt_pg is ' + str(nxt_pg))
        for page in pages:
            page_ = page.find_element_by_tag_name('a')
            page_link = page_.get_attribute('href')
            print('Taking a look at the different page links..')
            for page_link in range(1,pg_amount,1):
                driver.click(page_link)
                items = driver.find_elements_by_tag_name('h2')
                print('Going through each new page and getting the jobs for ya...')
                for item in items:
                    job_name = item.text
                    link = item.find_element_by_tag_name('a')
                    job_link = link.get_attribute('href')
                    data.append({'Job' : job_name, 'link' : job_link})
                    print('Appending the jobs to the data table....')
                if page_link == pg_amount:
                    print('Oh boy! pg_link == pg_amount...time to exit the loops')
                    return
    except Exception:
        print('An error has occurred when trying to run this script.  Please see the attached error message and screenshot.')
        driver.save_screenshot('screenshot.png')
        driver.close()
    return data

预期：

初始函数

从urlpag获取网站

按标记名查找元素，并在添加到列表时循环元素

完成后，所有元素都将退出并返回列表

第二功能

仍在urlpag上时，按类名查找元素，并获取下一页的链接

当我们有每一页需要抓取的时候，就要仔细检查每一页，并将元素附加到不同的表中

一旦我们达到pg_金额限制-退出并返回最终列表

实际：

初始函数

从urlpag获取网站

按标记名查找元素，并在添加到列表时循环元素

完成后，所有元素都将退出并返回列表

第二功能

查找类分页，打印nxt_变量，然后抛出下面的错误

对于有这个错误的个人，我最终切换到chromedriver，并用它来代替网络垃圾。似乎使用PhantomJS驱动程序有时会返回此错误

对于有此错误的个人，我最终切换到chromedriver，并将其用于网络垃圾处理。似乎使用PhantomJS驱动程序有时会返回此错误

@QHarr粘贴代码时出错。这两个变量在实际脚本中位于urlpag之前，现在位于这里（因为我已经更新了）。Thanks@QHarr在这里粘贴代码时出现错误。这两个变量在实际脚本中位于urlpag之前，现在位于这里（因为我已经更新了）。谢谢

Traceback (most recent call last):
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\Scripts\Indeedscraper\indeedscrape.py", line 23, in initial_scrape
    driver.get(urlpag)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 333, in get
    self.execute(Command.GET, {'url': url})
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchWindowException: Message: {"errorMessage":"Currently Window handle/name is invalid (closed?)"