Python 如何选择词典?
帮助请编写xpath表达式 html:Python 如何选择词典?,python,html,xpath,html-parsing,lxml,Python,Html,Xpath,Html Parsing,Lxml,帮助请编写xpath表达式 html: 产品构成 93%聚酰胺7%弹性纤维 衬里:100%聚酯纤维连衣裙长度:90厘米 产品属性 :船领、长袖、Midi、拉链、隐藏式、系带、侧边 衬里类型:全衬里 这需要获取以下html字典: data['Product Composition'] = '93% Polyamide 7% Elastane Lining: 100% Polyester</p><p>Dress Length: 90 cm' data['Product A
产品构成
93%聚酰胺7%弹性纤维
衬里:100%聚酯纤维连衣裙长度:90厘米
产品属性强>
:船领、长袖、Midi、拉链、隐藏式、系带、侧边
衬里类型:全衬里
这需要获取以下html字典:
data['Product Composition'] = '93% Polyamide 7% Elastane Lining: 100% Polyester</p><p>Dress Length: 90 cm'
data['Product Attributes;'] = ': Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining'
数据[“产品成分”]=“93%聚酰胺7%弹性纤维衬里:100%聚酯纤维连衣裙长度:90厘米”
数据['Product Attributes;']=':船领、长袖、Midi、拉链、隐藏式、花边、侧边衬里类型:全衬里'
元素的数量可以变化,这一点很重要。ie您需要一个通用的解决方案在
p
中获取每个strong
标记,然后获取它的父级和下一个父级的同级,直到有另一个p
标记内有strong
标记或不再有同级标记:
from lxml.html import fromstring
html_data = """<div class="TabItem">
<p><strong>Product Composition</strong></p>
<p>93% Polyamide 7% Elastane</p>
<p>Lining: 100% Polyester</p><p>Dress Length: 90 cm</p>
<p><strong>Product Attributes;</strong></p>
<p>: Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side</p>
<p>Lining Type: Full Lining</p>
</div>"""
tree = fromstring(html_data)
data = {}
for strong in tree.xpath('//p/strong'):
parent = strong.getparent()
description = []
next_p = parent.getnext()
while next_p is not None and not next_p.xpath('.//strong'):
description.append(next_p.text)
next_p = next_p.getnext()
data[strong.text] = " ".join(description)
print data
但是元素和可能是不同的数字。现在是2,但可能是10,还有1
from lxml.html import fromstring
html_data = """<div class="TabItem">
<p><strong>Product Composition</strong></p>
<p>93% Polyamide 7% Elastane</p>
<p>Lining: 100% Polyester</p><p>Dress Length: 90 cm</p>
<p><strong>Product Attributes;</strong></p>
<p>: Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side</p>
<p>Lining Type: Full Lining</p>
</div>"""
tree = fromstring(html_data)
data = {}
for strong in tree.xpath('//p/strong'):
parent = strong.getparent()
description = []
next_p = parent.getnext()
while next_p is not None and not next_p.xpath('.//strong'):
description.append(next_p.text)
next_p = next_p.getnext()
data[strong.text] = " ".join(description)
print data
{'Product Composition': '93% Polyamide 7% Elastane Lining: 100% Polyester',
'Product Attributes;': ': Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining'}