Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 2.7 如何使用selenium和Python刮取嵌套数据_Python 2.7_Selenium_Xpath - Fatal编程技术网

Python 2.7 如何使用selenium和Python刮取嵌套数据

Python 2.7 如何使用selenium和Python刮取嵌套数据,python-2.7,selenium,xpath,Python 2.7,Selenium,Xpath,我基本上想在和下的诉讼助理和奥尔斯旺,但我看不到去做。以下是HTML代码: <div class="pv-entity__summary-info"> <h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3> <h4> <span class="visually-hidden">Company Name</span> <span c

我基本上想在
下的诉讼助理奥尔斯旺,但我看不到去做。以下是HTML代码:

<div class="pv-entity__summary-info">

<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>

<h4>
  <span class="visually-hidden">Company Name</span>
  <span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>


  <div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
      <span class="visually-hidden">Dates Employed</span>
      <span>Feb 2016 – Present</span>
    </h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
        <span class="visually-hidden">Employment Duration</span>
        <span class="pv-entity__bullet-item">1 yr 2 mos</span>
      </h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
      <span class="visually-hidden">Location</span>
      <span class="pv-entity__bullet-item">London, United Kingdom</span>
    </h4></div>

</div>
我的输出: 体验标题:
[]

[]
您的
XPath
表达式不正确:

  • /*[@class=“Sans-17px-black-85%-semibold”]/h3/text()
    表示
    h3
    的文本内容,它是具有类名属性的元素的
    子元素
    “Sans-17px-black-85%-semibold”
    。相反,你需要

    //h3[@class=“Sans-17px-black-85%-semibold”]/text()

    这意味着具有类名属性的
    h3
    元素的文本内容
    “Sans-17px-black-85%-semibold”

  • /*[@class=“pv-position-entity\uu secondary-title pv-entity\uuu secondary-title Sans-15px-black-55%”text()
    中,您忘记了在
    text()
    之前有一个斜杠(您需要
    /text()
    ,而不仅仅是
    text()
    )。而且target
    span
    没有类名
    pv-position-entity\uuu secondary-title
    。你需要使用

    //span[@class=“pv-entity\uu secondary-title Sans-15px-black-55%”/text()


使用CSS选择器,您可以轻松地获得这两个选项,我发现它们比XPath更易于阅读和理解

driver.find_element_by_css_selector("div.pv-entity__summary-info > h3").text
driver.find_element_by_css_selector("div.pv-entity__summary-info span.pv-entity__secondary-title").text
表示类名
表示子级(仅低于一级)
表示后代(以下任何级别)

这里有一些参考资料可以帮助你开始

driver.find_element_by_css_selector("div.pv-entity__summary-info > h3").text
driver.find_element_by_css_selector("div.pv-entity__summary-info span.pv-entity__secondary-title").text