Python 使用Selenium的Webscraping-未找到元素_Python_Python 2.7_Selenium_Web Scraping

Python 使用Selenium的Webscraping-未找到元素

python python-2.7 selenium web-scraping

Python 使用Selenium的Webscraping-未找到元素,python,python-2.7,selenium,web-scraping,Python,Python 2.7,Selenium,Web Scraping,我正在努力清理这个网站：我想得到诊所的名称和地址，这是我正在使用的python代码 from selenium import webdriver import pd import time #driver = webdriver.Chrome() specialty = ["behavioral-health","dermatology","colon","ear-nose-and- throat","endocrine","express","family-practice

我正在努力清理这个网站：

我想得到诊所的名称和地址，这是我正在使用的python代码

from selenium import webdriver
import pd 
import time 

#driver = webdriver.Chrome()
specialty   = ["behavioral-health","dermatology","colon","ear-nose-and-    throat","endocrine","express","family-practice","foot-and-ankle",
           "gastroenterology","heart-%26-vascular","hepatobiliary-and-pancreas","infectious-disease","inpatient","internal-medicine",
           "neurology","nutrition","ob%2Fgyn","occupational-medicine","oncology","orthopedics","osteoporosis","pain-management",
           "pediatrics","plastic-surgery","pulmonary","rehabilitation","rheumatology","sleep","spine","sports-medicine","surgical","urgent-care",
           "urology","weight-loss","wound-care","pharmacy"]
name = []
address = []

for q in specialty: 
    driver = webdriver.Chrome()
    driver.get("https://www.novanthealth.org/home/patients--   visitors/locations/clinics.aspx?"+q+"=yes")
    x = driver.find_element_by_class_name("loc-link-right")
    num_page = str(x.text).split(" ")
    x.click() 

    for i in num_page:
        btn = driver.find_element_by_xpath('//*[@id="searchResults"]/div[2]/div[2]/button['+i+']')
        btn.click() 
        time.sleep(8) #instaed of this use waituntil #     
        temp = driver.find_element_by_class_name("gray-background").text
        temp0 = temp.replace("Get directions Website View providers\n","")

        x_temp = temp0.split("\n\n\n")

        for j in range(0,len(x_temp)-1):
            temp1 = x_temp[j].split("Phone:")
            name.append(temp1[0].split("\n")[1])
            temp3 = temp1[1].split("Office hours:")
            temp4 = temp3[0].split("\n")
            temp5 = temp4[1:len(temp4)]
            address.append(" ".join(temp5))
   driver.close()

如果我一次只将此代码用于一个专业，则此代码可以正常工作，但当我在如上所述的循环中传递这些专业时，代码在第二次迭代中失败，并出现以下错误：

Traceback (most recent call last):
 File "<stdin>", line 10, in <module>
File "C:\Anaconda2\lib\site- packages\selenium\webdriver\remote\webelement.py", line 77, in click self._execute(Command.CLICK_ELEMENT)
File C:\Anaconda2\lib\sitepackages\selenium\webdriver\remote\webelement.py", line 493, in _execute return self._parent.execute(command, params)
File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\webdriver.py",     line 249, in execute self.error_handler.check_response(response)
 File "C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 193, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.ElementNotVisibleException: Message: element not visible
(Session info: chrome=46.0.2490.80)
(Driver info: chromedriver=2.19.346078    (6f1f0cde889532d48ce8242342d0b84f94b114a1),platform=Windows NT 6.1 SP1 x86_64

回溯（最近一次呼叫最后一次）：
文件“”，第10行，在
文件“C:\Anaconda2\lib\site-packages\selenium\webdriver\remote\webelement.py”，第77行，单击self.\u执行（命令.单击\u元素）
文件C:\Anaconda2\lib\sitepackages\selenium\webdriver\remote\webelement.py”，第493行，在执行返回self.\u parent.execute（命令，参数）
文件“C:\Anaconda2\lib\site packages\selenium\webdriver\remote\webdriver.py”，第249行，在execute self.error\u handler.check\u response（response）中
文件“C:\Anaconda2\lib\site packages\selenium\webdriver\remote\errorhandler.py”，第193行，在check\u响应中
引发异常类（消息、屏幕、堆栈跟踪）
selenium.common.exceptions.ElementNotVisibleException:消息：元素不可见
（会话信息：chrome=46.0.2490.80）
（驱动程序信息：chromedriver=2.19.346078（6f1f0cde889532d48ce8242342d0b84f94b114a1），平台=Windows NT 6.1 SP1 x8664

我没有太多使用python的经验，如果有任何帮助，我将不胜感激。

错误消息告诉您为什么它不起作用

element不可见异常：消息：element不可见

如果不向下滚动查看该元素，则该元素不可见

您必须根据浏览器的大小向下滚动列表

或

只需从源页面提取数据，这更简单。

通常我会使用Selenium Basic，一个excel插件。你可以在Python中使用相同的逻辑。这在VBA中尝试过，对我来说效果很好

Private assert As New assert
Private driver As New Selenium.ChromeDriver

Sub sel_novanHealth()
Set ObjWB = ThisWorkbook
Set ObjExl_Sheet1 = ObjWB.Worksheets("Sheet1")
Dim Name As Variant

   'Open the website
    driver.get "https://www.novanthealth.org/home/patients--visitors/locations.aspx"

    driver.Window.Maximize

    driver.Wait (1000)

    'Find out the total number of pages to be scraped
    lnth = driver.FindElementsByXPath("//button[@class='paginate_button']").Count
   'Running the Loop for the Pages
    For y = 2 To lnth
            'Running the Loop for the Elements
            For x = 1 To 10
                Name = driver.FindElementsByXPath("//div[@class='span12 loc-heading']")(x).Text
                ' Element 2
                 'Element 3
            Next x
                driver.FindElementsByXPath("//button[@class='paginate_button']")(y).Click
    Next y

        driver.Wait (1000)


End Sub

您必须让您的web驱动程序等待几秒钟，直到相应的元素出现在页面上。请查看webdriver_wait函数。我已经阅读了有关该函数的文档，但在实现该函数时遇到了一些问题，您能给出一个示例代码吗？谢谢！这里是@AvinashRaj I added wait=WebDriverWait（driver，10）等待。直到（EC.presence_of_element_location（（By.ID，“searchResults”）），在btn=driver.find_element_By_xpath（'/*[@ID=“searchResults”]]/div[2]/div[2]/button['+i+']）之上，这一次它运行了2次迭代，但在第三次迭代中给出了相同的错误iteration@Vaibhav：值得避免直接询问“示例代码”这里。这通常被理解为“你愿意为我做我的工作吗”，即使这不是真正的意图。