Python 2.7 如何使用selenium和Python刮取嵌套数据
我基本上想在Python 2.7 如何使用selenium和Python刮取嵌套数据,python-2.7,selenium,xpath,Python 2.7,Selenium,Xpath,我基本上想在和下的诉讼助理和奥尔斯旺,但我看不到去做。以下是HTML代码: <div class="pv-entity__summary-info"> <h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3> <h4> <span class="visually-hidden">Company Name</span> <span c
和
下的诉讼助理和奥尔斯旺,但我看不到去做。以下是HTML代码:
<div class="pv-entity__summary-info">
<h3 class="Sans-17px-black-85%-semibold">Litigation Paralegal</h3>
<h4>
<span class="visually-hidden">Company Name</span>
<span class="pv-entity__secondary-title Sans-15px-black-55%">Olswang</span>
</h4>
<div class="pv-entity__position-info detail-facet m0"><h4 class="pv-entity__date-range Sans-15px-black-55%">
<span class="visually-hidden">Dates Employed</span>
<span>Feb 2016 – Present</span>
</h4><h4 class="pv-entity__duration de Sans-15px-black-55% ml0">
<span class="visually-hidden">Employment Duration</span>
<span class="pv-entity__bullet-item">1 yr 2 mos</span>
</h4><h4 class="pv-entity__location detail-facet Sans-15px-black-55% inline-block">
<span class="visually-hidden">Location</span>
<span class="pv-entity__bullet-item">London, United Kingdom</span>
</h4></div>
</div>
我的输出:
体验标题:[]
[]
您的XPath
表达式不正确:
表示/*[@class=“Sans-17px-black-85%-semibold”]/h3/text()
的文本内容,它是具有类名属性的元素的h3
子元素
。相反,你需要“Sans-17px-black-85%-semibold”
这意味着具有类名属性的//h3[@class=“Sans-17px-black-85%-semibold”]/text()
元素的文本内容h3
“Sans-17px-black-85%-semibold”
- 在
中,您忘记了在/*[@class=“pv-position-entity\uu secondary-title pv-entity\uuu secondary-title Sans-15px-black-55%”text()
之前有一个斜杠(您需要text()
,而不仅仅是/text()
)。而且targettext()
没有类名span
。你需要使用pv-position-entity\uuu secondary-title
//span[@class=“pv-entity\uu secondary-title Sans-15px-black-55%”/text()
driver.find_element_by_css_selector("div.pv-entity__summary-info > h3").text
driver.find_element_by_css_selector("div.pv-entity__summary-info span.pv-entity__secondary-title").text
表示类名
表示子级(仅低于一级)
表示后代(以下任何级别)
这里有一些参考资料可以帮助你开始
driver.find_element_by_css_selector("div.pv-entity__summary-info > h3").text
driver.find_element_by_css_selector("div.pv-entity__summary-info span.pv-entity__secondary-title").text