当HTML类具有相同名称时,如何使用python中的xpath提取数据
我正在尝试分别遍历值当HTML类具有相同名称时,如何使用python中的xpath提取数据,python,html,xpath,scrapy,Python,Html,Xpath,Scrapy,我正在尝试分别遍历值51011020,Recife,Boa Viagem,但我无法理解表达式如何区分这些元素,因为类具有名称 In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()') Out[24]: [<Selector
51011020
,Recife
,Boa Viagem
,但我无法理解表达式如何区分这些元素,因为类具有名称
In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()')
Out[24]:
[<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='51011020'>,
<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Recife'>,
<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Boa Viagem'>]
[24]中的response.xpath('//div[@class=“h3us20-5 jHoWDW”]//div[@class=“sc jTzLTM sc ksYbfQ sc-1f2ug0x-3 jcodVG”]/dd[@class=“sc ifakx sc-1f2ug0x-1 kffcla”]/text())
出[24]:
[,
,
]
尝试上面的代码时,它会同时返回这三个数据。我怎样才能单独得到它们?如能解释,将不胜感激
<div class="h3us20-5 jHoWDW">
<div class="h3us20-2 fMOiyI">
<div flexDirection="column" class="sc-jTzLTM sc-ksYbfQ uUqze">
<span weight="semiBold" theme="[object Object]" tag="span" color="dark" font-weight="400" class="sc-ifAKCX dqTZSU">Localização</span>
<div class="h3us20-4 eowFbc"></div>
<div data-testid="ad-properties" class="sc-bwzfXH h3us20-0 cBfPri">
<div class="sc-1ys3xot-0 h3us20-0 jyICCp">
<div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
<dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">CEP</dt>
<dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">51011020</dd>
</div>
</div>
<div class="sc-1ys3xot-0 h3us20-0 jyICCp">
<div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
<dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Município</dt>
<dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Recife</dd>
</div>
</div>
<div class="sc-1ys3xot-0 h3us20-0 jyICCp">
<div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
<dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Bairro</dt>
<dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Boa Viagem</dd>
</div>
</div>
</div>
</div>
<div class="h3us20-4 hrzRZZ"></div>
</div>
</div>
奥里萨昂
CEP
51011020
穆尼西皮奥
累西腓
拜罗
博阿维亚吉姆酒店
由于您需要单独的数据,因此需要3种不同的XPath
您可以使用位置索引([1]
,[2]
,[3]
)和()
):
或带轴的文本谓词(=“”
)(在兄弟姐妹之后
):
两种情况下的输出:
51011020
Recife
Boa Viagem
//dt[.="CEP"]/following-sibling::dd/text()
//dt[.="Município"]/following-sibling::dd/text()
//dt[.="Bairro"]/following-sibling::dd/text()
51011020
Recife
Boa Viagem