Python 如何使用lxml解析htmlpage,使用<;br/>;搞砸了?
我想用python中的lxml解析Nasa网站上的以下html片段:Python 如何使用lxml解析htmlpage,使用<;br/>;搞砸了?,python,html,html-parsing,lxml,lxml.html,Python,Html,Html Parsing,Lxml,Lxml.html,我想用python中的lxml解析Nasa网站上的以下html片段: <p> <strong>Launch Date:</strong>1981-09-24<br/> <strong>Launch Vehicle:</strong> Delta<br/> <strong>Launch Site:</strong> Cape Canav
<p>
<strong>Launch Date:</strong>1981-09-24<br/>
<strong>Launch Vehicle:</strong> Delta<br/>
<strong>Launch Site:</strong> Cape Canaveral, United States<br/>
<strong>Mass:</strong> 550.0 kg<br/>
</p>
但是头后面的值是空的…:
Launch Date:
Launch Vehicle:
Launch Site:
Mass:
我想这和我的工作有关
有人能帮我找到解决方案吗?对
strong
标记进行迭代,将它们视为标签,并将以下文本同级作为值:
rows = page.xpath('//div[@class="urtwo"]/p//strong')
for element in rows:
label = element.text.strip()
value = element.xpath("following-sibling::text()")[0].strip()
print(label, value)
印刷品:
('Launch Date:', u'1981-09-24')
(u'Launch\xa0Vehicle:', u'Delta')
(u'Launch\xa0Site:', u'Cape Canaveral, United States')
('Mass:', u'550.0\xa0kg')
哇!非常感谢!
('Launch Date:', u'1981-09-24')
(u'Launch\xa0Vehicle:', u'Delta')
(u'Launch\xa0Site:', u'Cape Canaveral, United States')
('Mass:', u'550.0\xa0kg')