Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/312.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
当HTML类具有相同名称时,如何使用python中的xpath提取数据_Python_Html_Xpath_Scrapy - Fatal编程技术网

当HTML类具有相同名称时,如何使用python中的xpath提取数据

当HTML类具有相同名称时,如何使用python中的xpath提取数据,python,html,xpath,scrapy,Python,Html,Xpath,Scrapy,我正在尝试分别遍历值51011020,Recife,Boa Viagem,但我无法理解表达式如何区分这些元素,因为类具有名称 In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()') Out[24]: [<Selector

我正在尝试分别遍历值
51011020
Recife
Boa Viagem
,但我无法理解表达式如何区分这些元素,因为类具有名称

In [24]: response.xpath('//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()')
Out[24]: 
[<Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='51011020'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Recife'>,
 <Selector xpath='//div[@class="h3us20-5 jHoWDW"]//div[@class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG"]/dd[@class="sc-ifAKCX sc-1f2ug0x-1 kFBcla"]/text()' data='Boa Viagem'>]
[24]中的
response.xpath('//div[@class=“h3us20-5 jHoWDW”]//div[@class=“sc jTzLTM sc ksYbfQ sc-1f2ug0x-3 jcodVG”]/dd[@class=“sc ifakx sc-1f2ug0x-1 kffcla”]/text())
出[24]:
[,
,
]
尝试上面的代码时,它会同时返回这三个数据。我怎样才能单独得到它们?如能解释,将不胜感激

<div class="h3us20-5 jHoWDW">
    <div class="h3us20-2 fMOiyI">
        <div flexDirection="column" class="sc-jTzLTM sc-ksYbfQ uUqze">
            <span weight="semiBold" theme="[object Object]" tag="span" color="dark" font-weight="400" class="sc-ifAKCX dqTZSU">Localização</span>
            <div class="h3us20-4 eowFbc"></div>
            <div data-testid="ad-properties" class="sc-bwzfXH h3us20-0 cBfPri">
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">CEP</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">51011020</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Município</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Recife</dd>
                    </div>
                </div>
                <div class="sc-1ys3xot-0 h3us20-0 jyICCp">
                    <div mt="3" block="true" class="sc-jTzLTM sc-ksYbfQ sc-1f2ug0x-3 jcodVG">
                        <dt tag="dt" theme="[object Object]" color="dark" weight="" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-0 btrQrs">Bairro</dt>
                        <dd weight="semiBold" tag="dd" theme="[object Object]" color="dark" font-weight="400" class="sc-ifAKCX sc-1f2ug0x-1 kFBcla">Boa Viagem</dd>
                    </div>
                </div>
            </div>
        </div>
        <div class="h3us20-4 hrzRZZ"></div>
    </div>
</div>

奥里萨昂
CEP
51011020
穆尼西皮奥
累西腓
拜罗
博阿维亚吉姆酒店

由于您需要单独的数据,因此需要3种不同的XPath

您可以使用位置索引(
[1]
[2]
[3]
)和
()
):

或带轴的文本谓词(
=“”
)(
在兄弟姐妹之后
):

两种情况下的输出:

51011020
Recife
Boa Viagem
//dt[.="CEP"]/following-sibling::dd/text()
//dt[.="Município"]/following-sibling::dd/text()
//dt[.="Bairro"]/following-sibling::dd/text()
51011020
Recife
Boa Viagem