Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用Selenium,Python中的靓汤刮一个灯箱覆盖层_Python_Selenium_Beautifulsoup - Fatal编程技术网

用Selenium,Python中的靓汤刮一个灯箱覆盖层

用Selenium,Python中的靓汤刮一个灯箱覆盖层,python,selenium,beautifulsoup,Python,Selenium,Beautifulsoup,我在编写代码时遇到了一些麻烦。我正在尝试使用selenium、beautiful soup和python来刮取覆盖层或灯箱中的内容。我不确定覆盖是如何创建的,但我认为它是ajax 当我运行以下python 2.7代码时,firefox浏览器会打开并导航到页面,单击正确的链接并向用户显示覆盖,我可以使用firefox检查其标记和标记,但我不知道如何让python访问覆盖 任何帮助都会受到这位新手的感激 #Import the beautiful soup library from bs4 imp

我在编写代码时遇到了一些麻烦。我正在尝试使用selenium、beautiful soup和python来刮取覆盖层或灯箱中的内容。我不确定覆盖是如何创建的,但我认为它是ajax

当我运行以下python 2.7代码时,firefox浏览器会打开并导航到页面,单击正确的链接并向用户显示覆盖,我可以使用firefox检查其标记和标记,但我不知道如何让python访问覆盖

任何帮助都会受到这位新手的感激

#Import the beautiful soup library 
from bs4 import BeautifulSoup
# import urllib2 library to actually go get the webpage for Beautiful Soup
import urllib2

#Import Selenium and the code needed to wait for the page to load
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


URLToParse ='http://courses.it-tallaght.ie/'

#Open the webpage using Soup to get the list of departments so we can iterate on them 
soup = BeautifulSoup(urllib2.urlopen(URLToParse))
#Open the webpage using selenium
driver = webdriver.Firefox()
driver.get(URLToParse)
subset = driver.find_element_by_id('homeProgrammes')


#Just get the part of the document that contains the list of department 
Depts = soup.find(id="homeProgrammes")
# For all the links in the div with id homeProgrammes 
for links in Depts.findAll('a'): 
    #Using selenium find the link to the depts list of courses that matches the link string from beautiful soup and click it
    FollowLink = subset.find_element_by_link_text(links.string)
    FollowLink.click()
    # Try waiting 10 seconds for the element with ID 'ProgrammeListForDepartment' becomes available and print the contents using prettify
    try: 
        element = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'ProgrammeListForDepartment')))
        Overlay = BeautifulSoup(driver.find_element_by_id('ProgrammeListForDepartment'))
        print(Overlay.prettify())
    except NoSuchElementException as e: 
            print(NoSuchElementException.msg())

您根本不需要
beautifulsou
Selenium
本身在这方面非常强大

下面是工作代码,它遍历所有部门,单击每个部门,提取课程列表并关闭覆盖窗口。结果被收集到字典中:

from pprint import pprint

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url ='http://courses.it-tallaght.ie/'

driver = webdriver.Firefox()
driver.get(url)

courses = {}
for department_link in driver.find_elements_by_css_selector("div#homeProgrammes a[onclick]"):
    department = department_link.text

    # open department
    department_link.click()

    # grab a list of courses
    overlay = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID, 'ProgrammeListForDepartment')))
    courses[department] = [course_link.text for course_link in overlay.find_elements_by_css_selector("ol > li > a")]

    # close department
    overlay.find_element_by_link_text("close").click()

pprint(courses)

driver.close()
它打印:

{u'Accounting & Prof. Studies': [u'Accounting Technician (ATI)',
                                 u'APICS Certificate in Production and Inventory Management (CPIM)',
                                 u'APICS Certified Supply Chain professional (CSCP)',
                                 u'Bachelor of Business (Honours) in Accounting & Finance',
                                 u'Bachelor of Business (Honours) in Accounting & Finance',
                                 u'Bachelor of Business in Accounting & Finance',
                                 u'Bachelor of Business in Accounting & Finance',
                                 u'Foundation Certificate in Personnel Practice (CIPD)',
                                 u'Foundation Diploma in Human Resource Practice (CIPD)',
                                 u'Higher Certificate in Business in Accounting',
                                 u'Higher Certificate in Business in Real Estate (Valuation, Sale and Management)'],
 u'Computing': [u'Bachelor of Science (Honours) in Computing',
                u'Bachelor of Science (Honours) in Computing',
                u'Bachelor of Science (Honours) in IT Management',
                u'Bachelor of Science (Honours) IT Management',
                u'Bachelor of Science in Computing',
                u'Bachelor of Science in Computing',
                u'Bachelor of Science in IT Management',
                u'Certificate in Cloud Computing Applications Development',
                u'Certificate in Cloud Computing Infrastructure Management',
                u'Certificate in Fundamentals of Software Development (Minor Award)',
                u'Certificate in Network Design and Implementation',
                u'Higher Certificate in Science in Information Technology',
                u'Higher Certificate in Science in IT Management',
                u'Higher Diploma in Science in Computing',
                u'M. Sc. in Distributed and Mobile Computing',
                u'M.Sc. in Information Technology Management',
                u'PhD in Information Technology',
                u'Postgraduate Diploma in Distributed and Mobile Computing',
                u'Postgraduate Diploma in Information Technology Management',
                u'Postgraduate Diploma in Science in Info Technology Management Information Technology Management'],
 u'Electronic Engineering': [u'Bachelor Degree in Engineering (Honours) in Electronic Engineering',
                             u'Bachelor of Engineering (Honours) in Electronic Engineering',
                             u'Bachelor of Engineering in Electronic Engineering',
                             u'Bachelor of Engineering In Electronic Engineering',
                             u'Cisco CCNA Routing & Switching',
                             u'Higher Certificate in Engineering in Electronic Engineering',
                             u'Masters of Engineering in Electronic Engineering in Electronic System Design',
                             u'Single Subject Certificate Structured Analogue Design'],
 u'External Services': [u'Access English',
                        u'Pre-Start Academic English',
                        u'Pre-Start Maths'],
 u'Humanities': [u'Bachelor of Arts (Honours) in Creative Digital Media',
                 u'Bachelor of Arts (Honours) in European Studies',
                 u'Bachelor of Arts (Honours) International Hospitality & Tourism Management',
                 u'Bachelor of Arts (Honours) Social Care Practice',
                 u'Bachelor of Arts (Ordinary) International Hospitality and Tourism Management',
                 u'Bachelor of Arts in Culinary Arts',
                 u'Bachelor of Arts in International Hospitality and Tourism Management',
                 u'English as a Foreign Language',
                 u'Higher Cert in Arts in International Hospitality & Tourism Operati in Int Hosp & Tourism Operations',
                 u'Higher Certificate in Arts in Culinary Arts'],
 u'Management': [u'Bachelor of Business (Honours) in Management',
                 u'Bachelor of Business (Honours) in Management',
                 u'Bachelor of Business in Management',
                 u'Bachelor of Science (Honours) in the Management of Innovation and Technology',
                 u'Bachelor of Science in the Management of Innovation and Technology',
                 u'Higher Certificate in Business in Business Administration',
                 u'International Digital Management & Sales',
                 u'TA_BMNGT_D - Bachelor of Business in Management'],
 u'Marketing': [u'Bachelor of Arts (Honours) in Advertising & Marketing Communications',
                u'Bachelor of Arts in Advertising and Marketing Communications',
                u'Bachelor of Business (Honours) in Marketing',
                u'Bachelor of Business (Honours) in Marketing Management',
                u'Bachelor of Business in Marketing',
                u'Bachelor of Business in Marketing',
                u'BSc in Data Analytics with Digital Marketing',
                u'Higher Certificate in Business in Marketing',
                u'Higher Diploma in Business in Marketing'],
 u'Mechanical Engineering': [u'B.Eng(Hons) in Mechanical Engineering',
                             u'Bachelor of Engineering (Honours) in Mechanical Engineering',
                             u'Bachelor of Engineering in Energy and Environmental Engineering',
                             u'Bachelor of Engineering in Mechanical Engineering',
                             u'Bachelor of Science (Honours) in Energy Systems Engineering',
                             u'Bachelor of Science (Hons) in Energy Systems Engineering',
                             u'Certificate in Project Management (IPMA)',
                             u'EIQA Diploma in Quality Management Quality Management',
                             u'Higher Certificate in Engineering in Mechanical Engineering',
                             u'Master of Engineering in Mechanical Engineering'],
 u'Science': [u'Bachelor of Science (Honours) in Bioanalytical Science',
              u'Bachelor of Science (Honours) in Bioanalytical Science',
              u'Bachelor of Science (Honours) in Pharmaceutical Science',
              u'Bachelor of Science (Hons) in Sports Science and Health',
              u'Bachelor of Science (Hons) in Sports Science and Health (1 Year Add-On)',
              u'Bachelor of Science Hons in DNA and Forensic Analysis',
              u'Bachelor of Science in Bio Analysis (1 year add-on Bachelor Degree)',
              u'Bachelor of Science in Bioanalysis or Chemical Analysis',
              u'Bachelor of Science in Chemical Analysis',
              u'Bachelor of Science in DNA and Forensic Analysis',
              u'Bachelor of Science in Pharmaceutical Science',
              u'Bachelor of Science in Pharmaceutical Technology',
              u'Bachelor of Science in Sports Science and Health',
              u'Bachelor of Science in Sterile Services Management',
              u'Certificate in Bioprocessing and Cleanroom Management - Minor Award',
              u'Certificate in Food Science and Technology Minor Award',
              u'Certificate in GMP & Regulatory Affairs (MIN) in GMP & Technology',
              u'Certificate in GMP and Medical Device Manufacture (Minor Award)',
              u'Copy of TA_SSPPM_B - Certificate in Pharmaceutical and Medical Device Manufacturing (Special Purpose Award)v2',
              u'Higher Certificate in Science Contamination Control and Asepsis for the Healthcare Sector',
              u'Higher Certificate in Science in Bio & Pharmaceutical Analysis',
              u'Higher Certificate in Science in GMP and Technology',
              u'Higher Certificate in Science in Process Technologies',
              u'Higher Diploma in Food Science and Technology',
              u'Higher Diploma in Science in Pharmaceutical Manufacturing',
              u'Masters in Pharmaceutical Manufacturing & Process Technology',
              u'PhD in Science in Biology',
              u'PhD in Science in Chemistry']}

谢谢@alecxe除了使用太多的代码和库之外,我不确定访问覆盖时哪里出错了。我能请你解释一下吗?@Michael你走的路总的来说是对的。如果您想用
BeautifulSoup
解析覆盖内容,您可能必须将
BeautifulSoup(driver.find_element_by_id('ProgrammeListForDepartment'))
替换为
BeautifulSoup(driver.find_element_by_id('ProgrammeListForDepartment'))。获取_属性(“outerHTML”)
。@Michael yeah,我会使用
bs\u element=BeautifulSoup(selenium\u web\u element.get\u attr‌​我为此付出(“outerHTML”)
。很抱歉@alecxe我在发布后才意识到该怎么做。非常感谢您在这方面的帮助