Xml XPath选择祖父母和特定叔叔节点
我在R中使用XPath,其XML结构如下:Xml XPath选择祖父母和特定叔叔节点,xml,r,xpath,Xml,R,Xpath,我在R中使用XPath,其XML结构如下: library(XML) xml1 <- xmlParse(' <L0> <L1> <ID>Get this ID</ID> <L1N1>Ignore node 1</L1N1> <L1N2> <L2> <L2N1>Get th
library(XML)
xml1 <- xmlParse('
<L0>
<L1>
<ID>Get this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N2>
<L2>
<L2N1>Get this node and all others in L2</L2N1>
</L2>
</L1N2>
<L1N3>Ignore node 3</L1N3>
</L1>
<L1>
<ID>Get this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N2>
<L2>
<L2N1>Get this node and all others in L2</L2N1>
</L2>
</L1N2>
<L1N4>Ignore node 4</L1N4>
</L1>
<L1>
<ID>Ignore this ID</ID>
<L1N1>Ignore node 1</L1N1>
<L1N3>Ignore node 3</L1N3>
<L1N4>Ignore node 4</L1N4>
</L1>
</L0>
')
我可以获得包含L2
子体的L1
节点:
getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1>
##
## [[2]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>
…但是现在ID
和L2
是分开的,而不是在L1
下,它还包括第三个L1
节点中没有L2
的元素
XPath能否返回所需的结果?如果没有,我可以在R中使用其他方法来实现结果吗?这似乎是您想要的(使用您的
xml1
):
getNodeSet(xml1, "//L1[descendant::L2]")
## [[1]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N3>Ignore node 3</L1N3> ## *Want to exclude this*
## </L1>
##
## [[2]]
## <L1>
## <ID>Get this ID</ID>
## <L1N1>Ignore node 1</L1N1> ## *Want to exclude this*
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
## <L1N4>Ignore node 4</L1N4> ## *Want to exclude this*
## </L1>
getNodeSet(xml1, "//L1/*[self::ID | child::L2]")
## [[1]]
## <ID>Get this ID</ID>
##
## [[2]]
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
##
## [[3]]
## <ID>Get this ID</ID>
##
## [[4]]
## <L1N2>
## <L2>
## <L2N1>Get this node and all others in L2</L2N1>
## </L2>
## </L1N2>
##
## [[5]]
## <ID>Ignore this ID</ID>
trim <- function(node) {
names <- names(node)
to.remove <- names[!(names %in% c("ID","L1N2"))]
removeChildren(node,kids=to.remove)
}
lapply(xml1["//L1[descendant::L2]"],trim)
# [[1]]
# <L1>
# <ID>Get this ID</ID>
# <L1N2>
# <L2>
# <L2N1>Get this node and all others in L2</L2N1>
# </L2>
# </L1N2>
# </L1>
#
# [[2]]
# <L1>
# <ID>Get this ID</ID>
# <L1N2>
# <L2>
# <L2N1>Get this node and all others in L2</L2N1>
# </L2>
# </L1N2>
# </L1>
lapply(xml1["//L1[descendant::L2]"],function(node) removeChildren(node,kids=names(node)[!(names(node)%in%c("ID","L1N2"))]))