Xml XPath选择祖父母和特定叔叔节点

Xml XPath选择祖父母和特定叔叔节点,xml,r,xpath,Xml,R,Xpath,我在R中使用XPath,其XML结构如下: library(XML) xml1 <- xmlParse(' <L0> <L1> <ID>Get this ID</ID> <L1N1>Ignore node 1</L1N1> <L1N2> <L2> <L2N1>Get th

我在R中使用XPath,其XML结构如下:

library(XML)

xml1 <- xmlParse('
<L0>
    <L1>
        <ID>Get this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N2>
            <L2>
                <L2N1>Get this node and all others in L2</L2N1>
            </L2>
        </L1N2>
        <L1N3>Ignore node 3</L1N3>
    </L1>
    <L1>
        <ID>Get this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N2>
            <L2>
                <L2N1>Get this node and all others in L2</L2N1>
            </L2>
        </L1N2>
        <L1N4>Ignore node 4</L1N4>
    </L1>
    <L1>
        <ID>Ignore this ID</ID>
        <L1N1>Ignore node 1</L1N1>
        <L1N3>Ignore node 3</L1N3>
        <L1N4>Ignore node 4</L1N4>
    </L1>
</L0>
                 ')
我可以获得包含
L2
子体的
L1
节点:

getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1> 
## 
## [[2]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>
…但是现在
ID
L2
是分开的,而不是在
L1
下,它还包括第三个
L1
节点中没有
L2
的元素


XPath能否返回所需的结果?如果没有,我可以在R中使用其他方法来实现结果吗?

这似乎是您想要的(使用您的
xml1
):

getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1> 
## 
## [[2]]
## <L1>
##   <ID>Get this ID</ID>
##   <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
##   <L1N2>
##     <L2>
##       <L2N1>Get this node and all others in L2</L2N1>
##     </L2>
##   </L1N2>
##   <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>
getNodeSet(xml1, "//L1/*[self::ID | child::L2]")
## [[1]]
## <ID>Get this ID</ID> 
##   
## [[2]]
## <L1N2>
##   <L2>
##     <L2N1>Get this node and all others in L2</L2N1>
##   </L2>
## </L1N2> 
## 
## [[3]]
## <ID>Get this ID</ID> 
##   
## [[4]]
## <L1N2>
##   <L2>
##     <L2N1>Get this node and all others in L2</L2N1>
##   </L2>
## </L1N2> 
## 
## [[5]]
## <ID>Ignore this ID</ID>
trim <- function(node) {
  names     <- names(node)
  to.remove <- names[!(names %in% c("ID","L1N2"))]
  removeChildren(node,kids=to.remove)
}
lapply(xml1["//L1[descendant::L2]"],trim)
#  [[1]]
# <L1>
#   <ID>Get this ID</ID>
#   <L1N2>
#     <L2>
#       <L2N1>Get this node and all others in L2</L2N1>
#     </L2>
#   </L1N2>
# </L1> 
# 
# [[2]]
# <L1>
#   <ID>Get this ID</ID>
#   <L1N2>
#     <L2>
#       <L2N1>Get this node and all others in L2</L2N1>
#     </L2>
#   </L1N2>
# </L1> 
lapply(xml1["//L1[descendant::L2]"],function(node) removeChildren(node,kids=names(node)[!(names(node)%in%c("ID","L1N2"))]))