Python 如何删除xml的某些节点?
有一个XML文件包含如下内容Python 如何删除xml的某些节点?,python,xml,awk,Python,Xml,Awk,有一个XML文件包含如下内容 <node1> bla <remove> abc </remove> kkk </node1> $ awk '/<node1>/{gsub(/<[/]?remove>/," ")} {printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | gre
<node1>
bla
<remove>
abc
</remove>
kkk
</node1>
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
布拉
abc
kkk
我需要删除node1下的节点,但是像
这样的节点也包含
,它们不应该被删除,我想知道怎么做,可能是awk脚本或Python之类的
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
输出应该是
<node1>
bla
abc
kkk
</node1>
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
布拉
abc
kkk
使用以下输入:
$ cat file
<node1>
bla
<remove>
abc
</remove>
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
如果在一行中找不到标记,脚本甚至会执行此操作:
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
$ cat file
<node1>bla<remove>abc</remove>kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file
<node1>bla abc kkk</node1>
<node9>bla<remove>abc</remove>kkk</node9>
$cat文件
布拉布克克
布拉布克克
$awk'/{gsub(//,“”)}
{printf“%s%s”,$0,RT}'RS=''文件
bla abc kkk
布拉布克克
您应该知道,使用文本处理来修改xml有风险。如果你必须这样做,这个sed一行应该适用于你的例子和sudo答案中的例子:
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
sed '/node1>/,/node1>/{/remove>/d}' file
另一个
awk
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
awk '/node1>/,/\/node1>/ {if ($0~/remove>/) $0=""} NF'
我建议使用
xml
解析器。一个好的例子是beautifulsou
:
$ awk '/<node1>/{gsub(/<[/]?remove>/," ")}
{printf "%s%s",$0,RT}' RS='</node[0-9]+>' file | grep '\S'
<node1>
bla
abc
kkk
</node1>
<node9>
bla
<remove>
abc
</remove>
kkk
</node9>
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(open(sys.argv[1], 'r'), 'xml')
for elem in soup.node1.children:
if elem.name == 'remove':
elem.decompose()
print(soup)
+1,如果标签如前所述位于单行上,这是一种很好的方法。您应该指出,这只适用于
gwak
,因为RS
不止一个字符。@Jotne是的,我原来的意思是,一定是分心了。