python、lxml检索列表中的所有元素
我正在尝试从网站获取列表中的所有元素 从以下html代码段:python、lxml检索列表中的所有元素,python,html,web-scraping,lxml,Python,Html,Web Scraping,Lxml,我正在尝试从网站获取列表中的所有元素 从以下html代码段: <ul> <li class="name"> James </li> <li> Male </li> <li> 5'8" </li> </ul> 印刷品 [' James ', ' Male ', ' 5\'8" '] XPath“//li[../li[@class=“name”and position()=1
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
印刷品
[' James ', ' Male ', ' 5\'8" ']
XPath
“//li[../li[@class=“name”and position()=1]]/text()”
表示
//li # all li elements
[ # whose
.. # parent
/ # has a child
li # li element
[ # whose
@class="name" # class attribute equals "name"
and # and
position()=1] # which is the first child element
]
/text() # return the text of those elements
[' James ', ' Male ', ' 5\'8" ']
//li # all li elements
[ # whose
.. # parent
/ # has a child
li # li element
[ # whose
@class="name" # class attribute equals "name"
and # and
position()=1] # which is the first child element
]
/text() # return the text of those elements
from lxml import html
text = '''<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>
<ul>
<li class="name"> James </li>
<li> Male </li>
<li> 5'8" </li>
</ul>'''
tree = html.fromstring(text)
for ul in tree.xpath('//ul[li[@class="name"]]'): # loop through the ul tag, whose child tag contains class attribute and the value is 'name'
print(ul.xpath("li/text()")) # get all the text in the li tag
[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']
[' James ', ' Male ', ' 5\'8" ']