Selenium 如何查找此网站的xpath表达式?

Selenium 如何查找此网站的xpath表达式?,selenium,xpath,web-scraping,css-selectors,Selenium,Xpath,Web Scraping,Css Selectors,我想刮掉这个网页 我有很多简历,我的任务是收集每一份简历的技能。 这是网页的链接==> 您可以使用浏览器的开发工具查找任何html元素的XPath。下面的示例是针对Chrome的,但是其他浏览器的步骤非常类似: 右键单击页面上要了解其XPath的项目 单击“检查”。这将打开开发工具,突出显示相关元素 如果要查找的元素不是高亮显示的元素,请在显示的交互式html中导航。通常,当您悬停一个元素时,匹配的项会在主页面中突出显示 右键单击导航器中的html元素 选择“复制->复制XPath” 在抓取这些

我想刮掉这个网页

我有很多简历,我的任务是收集每一份简历的技能。 这是网页的链接==>


您可以使用浏览器的开发工具查找任何html元素的XPath。下面的示例是针对Chrome的,但是其他浏览器的步骤非常类似:

  • 右键单击页面上要了解其XPath的项目
  • 单击“检查”。这将打开开发工具,突出显示相关元素
  • 如果要查找的元素不是高亮显示的元素,请在显示的交互式html中导航。通常,当您悬停一个元素时,匹配的项会在主页面中突出显示
  • 右键单击导航器中的html元素
  • 选择“复制->复制XPath”

  • 在抓取这些页面时,您可能会遇到的主要问题是,您的目标可能每次都不在同一位置。如果这些文档是由用户创建的,那么它们每次都可能有不同的布局,因此XPath可能与每个页面上的相同部分不匹配。如果是这样的话,您可能需要使用更复杂的方法(jQuery、Selenium、Cypress等都有按文本内容搜索、在父/子元素之间导航等工具)。

    这里实际上不需要使用
    Selenium
    。您可以使用
    beautifulsou
    轻松完成此操作。以下是完整的代码:

    import requests
    from bs4 import BeautifulSoup
    
    r = requests.get('https://www.livecareer.com/resume-search/search?jt=software%20engineer').text
    
    soup = BeautifulSoup(r,'html.parser')
    
    ul = soup.find('ul',class_ = 'resume-list list-unstyled')
    
    li_items = ul.find_all('li')[1:]
    
    links = []
    
    for li in li_items:
        links.append('https://www.livecareer.com/'+li.a['href'])
    
    skills = []
    
    for link in links:
        
        r = requests.get(link).text
        soup = BeautifulSoup(r,'html.parser')
        div = soup.find('div',class_ = 'field singlecolumn')
        skills.append(div.text)
    
    print(skills)
    
    输出:

    ['agile, AutoCAD, C++, CAD, Oral, data entry, database, Engineer in Training, EIT, Engineering analysis, XML, functional, GUI, HTML, JavaScript, Team leadership, Lockheed Martin, macros, Manufacturing processes, MATLAB, mechanical, meetings, Excel, Organizational skills, presentations, Process improvement, program management, programming, Project planning, Python, research, scrum, Six Sigma, Software development, Solidworks, SQL, switches, telemetry, video, Web design, website, Written communication', "Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters, flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.", 'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,C++,JAVA and python.', 'Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n     and asymmetric key cryptography algorithms, and JAVA concepts like multi-threading, socket\r\n     programming, etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n    Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n     project\r\n    Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n     application of distributed programming concepts like Sockets, RPC, multi-threading, etc.RESEARCH PAPER:\r\n    XMorph: A Shape-Polymorphic, Domain-Specific XML Data Transformation Language,\r\n     International Conference on Data Engineering (ICDE 2010), IEEE CS, Los Angeles, USA, March 2010.', 'Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering, Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.', 'Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance, process management and strong operational skills to a rapidly growing company. ', 'Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working, \r\ntaskoriented person with an indepth understanding of software development and database tuning.', '3 Years of experience in Information Technology with emphasis on Design, Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC, Web services , Web sockets, Spring Frameworks, Spring-boot, Angular, JQuery, XML/XSLT, JSON, Integration Developer Service Component Architecture & Service Data Objects, Rational Application Developer, Test Driven Development using JUnit, Jenkins, GIT, Cloud Foundry,  Eclipse/Intelij IDE, UNIX, Gradle Scripts, DB2/Oracle/MySQL Databases.', '.NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP.NET 3.5, AJAX, ASM, Banking, Basic, Business Objects, c, CSS, CSS 2, customer satisfaction, data analysis, Database, delivery, EBusiness, editor, Electronics, HP, HTML 4, HTML, IDE, IIS 7.0, ITIL, JavaScript, C#, C# 3.0, Windows, windows applications, 2000, 3.1, Windows 98, Enterprise, Oct, Operating systems, Oracle 9, Oracle database, PL/SQL, personnel, programming, recording, reporting, sales, Servers, Service Level Agreement, SLA, Visual SourceSafe, Visual  SourceSafe, SQL, SQL Server, technical support, TOAD, UNIX, vi, Microsoft Visual Studio, Visual studio, Windows server', 'Represent Stanford  Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD, UPenn, MIT, Columbia University & Ohio \r\n*Have a keen interest in photography, especially of dancers in motion.']
    
                                                                                                               
                    Skills                                                   Links
    0  https://www.livecareer.com//resume-search/r/so...  agile, AutoCAD, C++, CAD, Oral, data entry, da...
    1  https://www.livecareer.com//resume-search/r/so...  Senior Outreach at Senior Center Planned and o...
    2  https://www.livecareer.com//resume-search/r/so...  Motivated Software Engineer seeking employment...
    3  https://www.livecareer.com//resume-search/r/so...  Developed peer-to-peer secure file transfer sy...
    4  https://www.livecareer.com//resume-search/r/so...  Performance evaluation of In-Kernel System Cal...
    5  https://www.livecareer.com//resume-search/r/so...  Accomplished Computer Technician with a rapidl...
    6  https://www.livecareer.com//resume-search/r/so...  Seeking a fulltime position as a Developer / S...
    7  https://www.livecareer.com//resume-search/r/so...  3 Years of experience in Information Technolog...
    8  https://www.livecareer.com//resume-search/r/so...  .NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP...
    9  https://www.livecareer.com//resume-search/r/so...  Represent Stanford  Ballroom Dance team in var...
    
    您还可以通过将以下行添加到您的代码中,使用此代码创建漂亮的
    数据帧
    (为了更好的可读性):

    dictionary = {'Links':links,
                  'Skills':skills}
    
    df = pd.DataFrame(dictionary)
    
    print(df)
    
    输出:

    ['agile, AutoCAD, C++, CAD, Oral, data entry, database, Engineer in Training, EIT, Engineering analysis, XML, functional, GUI, HTML, JavaScript, Team leadership, Lockheed Martin, macros, Manufacturing processes, MATLAB, mechanical, meetings, Excel, Organizational skills, presentations, Process improvement, program management, programming, Project planning, Python, research, scrum, Six Sigma, Software development, Solidworks, SQL, switches, telemetry, video, Web design, website, Written communication', "Senior Outreach at Senior Center\xa0Planned and organized a joint celebration of the Chinese New Year with the collaboration of the Westborough Public Schools. \xa0Promoted cultural awareness and broke the language barrier of different races of backgrounds.Volunteer at ChurchIdentified problems and implemented a process to eliminate a data-entry camp registration process by 100% by building new online Registration Forms for registration and the student's cultural classes arrangement.Designed posters, flyers and presentation slides with graphics and photos for the different organizational events with Microsoft Word and PowerPointDeveloped a structural documentation on publishing an annual report with detailed steps and instructions on the process that are easy to follow and quickly learn by others. \xa0Implemented a 30th Anniversary Special Edition project in a commercial quality of work with excellent time management skills to meet the deadline.Cayenne SoftwareExperienced team spirit in effort of reducing the workload of bugs fixes on the software product.\u200bAllmerica Financial CompanyProvided a sole support to the Hanover 1099 system with strong commitment and responsibility. Fined tuned the system resulting in cost savings for the Allmerica Financial Company. Winner of the Gold Crown Customer Recognition award.", 'Motivated Software Engineer seeking employment as part of a dynamic software development team. Fluent in C,C++,JAVA and python.', 'Developed peer-to-peer secure file transfer system in JAVA.This involved the application of symmetric\r\n     and asymmetric key cryptography algorithms, and JAVA concepts like multi-threading, socket\r\n     programming, etc.Implemented a system to query XML in JAVA.The query language was a subset of XPath\r\n    Modeled a project "Personal Health Management System" using UML and implemented it in Visual C#.The code was tested using NUnit.Object oriented software development process was used for this\r\n     project\r\n    Developed a \'license plate game\' in C on LINUX o/s using client/server architecture.This required the\r\n     application of distributed programming concepts like Sockets, RPC, multi-threading, etc.RESEARCH PAPER:\r\n    XMorph: A Shape-Polymorphic, Domain-Specific XML Data Transformation Language,\r\n     International Conference on Data Engineering (ICDE 2010), IEEE CS, Los Angeles, USA, March 2010.', 'Performance evaluation of In-Kernel System Call Implemented and evaluated In-kernel system call using dynamic loadable kernel module on x_86_64 architecture.Re-Development free approach to migrate Java applications to cloud at College of Engineering, Pune Implemented file access sub-system of a WebJDK which leverages the File-System API provided by HTML5.This allows the use of standard Java APIs for accessing client files.2016 2013.', 'Accomplished Computer Technician with a rapidly increasing range of industry experience looking to bring strong instincts and a proven record of procedural compliance, process management and strong operational skills to a rapidly growing company. ', 'Seeking a fulltime position as a Developer / Systems Admin / DBA for a company needing a hard working, \r\ntaskoriented person with an indepth understanding of software development and database tuning.', '3 Years of experience in Information Technology with emphasis on Design, Development and End to End Implementation of Consulting based solutions with expertise on working with Object Orient Analysis and Design using Java/J2EE Technologies viz. JSP/Servlets/EJB,JDBC, Web services , Web sockets, Spring Frameworks, Spring-boot, Angular, JQuery, XML/XSLT, JSON, Integration Developer Service Component Architecture & Service Data Objects, Rational Application Developer, Test Driven Development using JUnit, Jenkins, GIT, Cloud Foundry,  Eclipse/Intelij IDE, UNIX, Gradle Scripts, DB2/Oracle/MySQL Databases.', '.NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP.NET 3.5, AJAX, ASM, Banking, Basic, Business Objects, c, CSS, CSS 2, customer satisfaction, data analysis, Database, delivery, EBusiness, editor, Electronics, HP, HTML 4, HTML, IDE, IIS 7.0, ITIL, JavaScript, C#, C# 3.0, Windows, windows applications, 2000, 3.1, Windows 98, Enterprise, Oct, Operating systems, Oracle 9, Oracle database, PL/SQL, personnel, programming, recording, reporting, sales, Servers, Service Level Agreement, SLA, Visual SourceSafe, Visual  SourceSafe, SQL, SQL Server, technical support, TOAD, UNIX, vi, Microsoft Visual Studio, Visual studio, Windows server', 'Represent Stanford  Ballroom Dance team in various competitions in the Bay area.\r\n*Represented University of Maryland in Ballroom dance competitions in UMD, UPenn, MIT, Columbia University & Ohio \r\n*Have a keen interest in photography, especially of dancers in motion.']
    
                                                                                                               
                    Skills                                                   Links
    0  https://www.livecareer.com//resume-search/r/so...  agile, AutoCAD, C++, CAD, Oral, data entry, da...
    1  https://www.livecareer.com//resume-search/r/so...  Senior Outreach at Senior Center Planned and o...
    2  https://www.livecareer.com//resume-search/r/so...  Motivated Software Engineer seeking employment...
    3  https://www.livecareer.com//resume-search/r/so...  Developed peer-to-peer secure file transfer sy...
    4  https://www.livecareer.com//resume-search/r/so...  Performance evaluation of In-Kernel System Cal...
    5  https://www.livecareer.com//resume-search/r/so...  Accomplished Computer Technician with a rapidl...
    6  https://www.livecareer.com//resume-search/r/so...  Seeking a fulltime position as a Developer / S...
    7  https://www.livecareer.com//resume-search/r/so...  3 Years of experience in Information Technolog...
    8  https://www.livecareer.com//resume-search/r/so...  .NET 3.5, .NET, ASP .NET 3.5, ASP.NET 2.0, ASP...
    9  https://www.livecareer.com//resume-search/r/so...  Represent Stanford  Ballroom Dance team in var...
    

    希望这有帮助

    要把这个扔出去

    如前所述,使用铬 右键单击所需的图元 点击检查 按ctrl+f键打开搜索窗口

    然后编写xpath,使其返回一个(并且只返回一个)页面对象

    例:

    为该项创建xpath。永远不要使用“复制Xpath”选项。您将获得如下路径,这是您可以编写的最脆弱的xpath

    //*[@id="wrapper"]/div[2]/div[2]/div[1]/aside[1]/div/div/div[2]/div/div[1]/a
    
    如果您不熟悉xpath的编写,请到这里阅读

    这里的一个问题是文本在div标记中,并且应该在span中。您可以尝试以下方法:

    //div[@class='field singlecolumn']/text()
    

    谢谢,这真的很有帮助,但我需要XPath,因为我使用的是scrapy,我不熟悉BeautifulSoup。你可以开始学习
    BeautifulSoup
    ,因为它是一个非常强大的库。我建议你开始学习它。还有我这边的一个谦虚的请求。你能接受我的答案是最好的答案吗?但是如果你确实想使用scrapy,那么你必须遵循Steven Hardy的答案。但是
    BeautifulSoup
    是从网站上抓取信息的最佳库。所以决定你想用哪一个。谢谢,这确实很有帮助,但我需要XPath,因为我使用的是scrapy,我对beautifulsoup不太熟悉。你能帮我吗?这是@Sushil的答案提到了beautifulsoup。您将在我的答案中找到您想要的,关于如何计算页面上任何元素的XPath的说明。。。