Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/xpath/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用xpath/python选择特定节点的父节点_Python_Xpath - Fatal编程技术网

使用xpath/python选择特定节点的父节点

使用xpath/python选择特定节点的父节点,python,xpath,Python,Xpath,如何获取此html片段中a的href值 我需要根据I标记中的类来获取它 <!-- <a href="https://link.com" target="_blank"><i class="foobar"></i> </a> --> 您的代码对我来说确实有效-它返回一个列表或简单的/。。回到s 您应该注意,目标元素是HTML注释。您不能简单地使用XPath(如//a)从注释中获取,因为在本例中,它不是节点,而是简

如何获取此html片段中a的href值

我需要根据I标记中的类来获取它

<!--
<a href="https://link.com" target="_blank"><i class="foobar"></i>  </a>           
-->

您的代码对我来说确实有效-它返回一个列表或简单的/。。回到s


您应该注意,目标元素是HTML注释。您不能简单地使用XPath(如//a)从注释中获取,因为在本例中,它不是节点,而是简单的字符串

请尝试以下代码:

import re

foo_links = tree.xpath('//comment()') # get list of all comments on page
for link in foo_links:
    if '<i class="foobar">' in link.text:
        href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment
        break

另外,您可能需要使用更复杂的正则表达式来匹配链接URL

只是好奇为什么不只是//a/@href?@svasa OP说我需要基于I tagOk中没有看到的类来获取它。明白了,这似乎是最有效的。评论/删除了中断,我得到了我想要的
hrefs = tree.xpath('//a[i/@class="foobar"]/@href')
hrefs = tree.xpath('//a/i[@class="foobar"]/../@href')
#                     ^                    ^  ^
#                     |                    |  obtain the 'href'
#                     |                    |
#                     |                    get the parent of the <i>
#                     |
#                     find all <i class="foobar"> contained in an <a>.
hrefs = [href for comment in tree.xpath('//comment()') 
              # find all comments
              for href in lxml.html.fromstring(comment.text)
              # parse content of comment as a new HTML file
                              .xpath('//a[i/@class="foobar"]/@href')
                              # read those hrefs.
]
import re

foo_links = tree.xpath('//comment()') # get list of all comments on page
for link in foo_links:
    if '<i class="foobar">' in link.text:
        href = re.search('\w+://\w+.\w+', link.text).group(0) # get href value from required comment
        break