Python 如何选择词典？_Python_Html_Xpath_Html Parsing_Lxml

Python 如何选择词典？

python html xpath

Python 如何选择词典？,python,html,xpath,html-parsing,lxml,Python,Html,Xpath,Html Parsing,Lxml,帮助请编写xpath表达式 html: 产品构成 93%聚酰胺7%弹性纤维衬里：100%聚酯纤维连衣裙长度：90厘米产品属性：船领、长袖、Midi、拉链、隐藏式、系带、侧边衬里类型：全衬里这需要获取以下html字典： data['Product Composition'] = '93% Polyamide 7% Elastane Lining: 100% PolyesterDress Length: 90 cm' data['Product A

帮助请编写xpath表达式

html:


产品构成
93%聚酰胺7%弹性纤维
衬里：100%聚酯纤维
连衣裙长度：90厘米
产品属性
：船领、长袖、Midi、拉链、隐藏式、系带、侧边
衬里类型：全衬里

这需要获取以下html字典：

data['Product Composition'] = '93% Polyamide 7% Elastane Lining: 100% PolyesterDress Length: 90 cm' data['Product Attributes;'] = ': Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining'

数据[“产品成分”]=“93%聚酰胺7%弹性纤维衬里：100%聚酯纤维连衣裙长度：90厘米” 数据['Product Attributes；']='：船领、长袖、Midi、拉链、隐藏式、花边、侧边衬里类型：全衬里'

元素的数量可以变化，这一点很重要。ie您需要一个通用的解决方案
在
p
中获取每个
strong
标记，然后获取它的父级和下一个父级的同级，直到有另一个
p
标记内有
strong
标记或不再有同级标记：

from lxml.html import fromstring html_data = """<div class="TabItem"> Product Composition 93% Polyamide 7% Elastane Lining: 100% PolyesterDress Length: 90 cm Product Attributes; : Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining </div>""" tree = fromstring(html_data) data = {} for strong in tree.xpath('//p/strong'): parent = strong.getparent() description = [] next_p = parent.getnext() while next_p is not None and not next_p.xpath('.//strong'): description.append(next_p.text) next_p = next_p.getnext() data[strong.text] = " ".join(description) print data

但是元素和可能是不同的数字。现在是2，但可能是10，还有1
from lxml.html import fromstring html_data = """<div class="TabItem"> Product Composition 93% Polyamide 7% Elastane Lining: 100% PolyesterDress Length: 90 cm Product Attributes; : Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining </div>""" tree = fromstring(html_data) data = {} for strong in tree.xpath('//p/strong'): parent = strong.getparent() description = [] next_p = parent.getnext() while next_p is not None and not next_p.xpath('.//strong'): description.append(next_p.text) next_p = next_p.getnext() data[strong.text] = " ".join(description) print data

{'Product Composition': '93% Polyamide 7% Elastane Lining: 100% Polyester', 'Product Attributes;': ': Boat Neck, Long Sleeve, Midi, Zip, Concealed, Laced, Side Lining Type: Full Lining'}