Python 如何获取id'；一个部门的所有孩子的名字_Python_Xpath_Web Scraping_Lxml

Python 如何获取id'；一个部门的所有孩子的名字

python xpath web-scraping

Python 如何获取id'；一个部门的所有孩子的名字,python,xpath,web-scraping,lxml,Python,Xpath,Web Scraping,Lxml,我使用lxml来刮取特定的页面。我知道如何按id抓取标签，但找不到如何抓取实际的id属性例如，假设html是： <div id="stuff" > <div id="some unknown"> xxxx </div> <div id="another unknown"> xxxxx </div> </div> 有专门使用xpath的方法吗？如果需要直接子项的ids，可以使用以下xpath查询： #

我使用lxml来刮取特定的页面。我知道如何按id抓取标签，但找不到如何抓取实际的id属性

例如，假设html是：

<div id="stuff" >
    <div id="some unknown"> xxxx </div>
    <div id="another unknown"> xxxxx </div>
</div>

有专门使用xpath的方法吗？

如果需要直接子项的

id

s，可以使用以下xpath查询：

#                                       v obtain id attribute
document.xpath('//*[@id="stuff"]/*[@id]/@id')
#                 ^ #stuff tag   ^ child with id attribute

因此，我们首先查找

标记，然后查找具有

@id

的直接子级（任何标记），并从这些子级中获取

@id

这将返回

lxml.etree.\u elementunicodesult

元素的列表。但是，我们可以使用

str（..）

获取字符串值：

[str(the_id) for the_id in document.xpath('//*[@id="stuff"]/*[@id]/@id')]

注意，在这里我们确实注意到我们关心孩子的类型。如果您只需要

子项的

id

s，可以使用：

#                                         v obtain id attribute
document.xpath('//*[@id="stuff"]/div[@id]/@id')
#                 ^ #stuff tag   ^ child with id attribute

如果要查找所有子体，只需在

@id=“stuff”

查询和子查询之间添加一个额外的斜杠：

#                                        v obtain id attribute
document.xpath('//*[@id="stuff"]//*[@id]/@id')
#                 ^ #stuff tag    ^ descendant with id attribute

你试过什么xpath？老实说，我不知道该怎么做。我尝试了

root.xpath（'/*[@id=“stuff”]/div/'）

来获取div本身。感谢非常详细的答案，我将尝试一下-编辑-它可以工作！

#                                        v obtain id attribute
document.xpath('//*[@id="stuff"]//*[@id]/@id')
#                 ^ #stuff tag    ^ descendant with id attribute