sed编辑、删除xml标记_Xml_Bash_Sed

sed编辑、删除xml标记

xml bash sed

sed编辑、删除xml标记,xml,bash,sed,Xml,Bash,Sed,我是名为-sed的伟大编辑的新手我想删除所有xml标记并提取特定标记-reportBody之间的字符串以下是一行中的外观： <?xml version="1.0" ?><SOAP- ENV:Envelope xmlns:SOAP-ENV="blablah"><SOAP-ENV:Body> <getReportResponsexmlns:msgns="blahblahblah" xmlns="blahblah"><returnxmlns="

我是名为-sed的伟大编辑的新手

我想删除所有xml标记并提取特定标记-reportBody之间的字符串

以下是一行中的外观：

<?xml version="1.0" ?><SOAP- ENV:Envelope xmlns:SOAP-ENV="blablah"><SOAP-ENV:Body> <getReportResponsexmlns:msgns="blahblahblah" xmlns="blahblah"><returnxmlns=""> <returnCode><majorReturnCode>000</majorReturnCode><minorReturnCode>0000</minorReturnCode><returnCode><reportName>blahblah</reportName><reportTitle>blahblahblahr</reportTitle><reportBody>STRING TO EXTRACT</reportBody><reportMimeType>text/csv</reportMimeType></return></getReportResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>

0000000 blahblahblahblahrstring以提取文本/csv

问题是xml文件可能不同，有时它写在一行中，或者写在2-3行中，或者要提取的字符串将存储在reportBody标记之间的多行中。所以它可以是这样的，甚至是不同的：

    <?xml version="1.0" ?><SOAP- ENV:Envelope xmlns:SOAP-ENV="blablah"><SOAP-ENV:Body> 
`enter code here`<getReportResponsexmlns:msgns="blahblahblah" xmlns="blahblah">
<returnxmlns=""> <returnCode>
<majorReturnCode>000</majorReturnCode><minorReturnCode>0000</minorReturnCode>
<returnCode>
<reportName>blahblah</reportName><reportTitle>blahblahblahr</reportTitle><reportBody>
STRING 
TO 
EXTRACT</reportBody>
<reportMimeType>text/csv</reportMimeType></return>
</getReportResponse></SOAP-ENV:Body></SOAP-ENV:Envelope>


`在这里输入代码`
0000000
布拉布拉赫布拉赫布拉赫布拉赫布拉赫布拉赫
一串
到
摘录
文本/csv

应对所有可能变化的解决方案是什么？

另外，我可以设置参数来保存文件并将字符串解码为base64吗？谢谢

您可以使用此gnu awk来提取它：

awk -v RS='<reportBody>.*</reportBody>' 'RT{print RT}' file.xml
<reportBody>
STRING
TO
EXTRACT</reportBody>

awk-vrs='.''''RT{print RT}'file.xml
一串
到
摘录

通过第一次输入，您将获得以下输出：

<reportBody>STRING TO EXTRACT</reportBody>

要提取的字符串

-v RS='.'.'

将输入记录分隔符设置为从

到

使用：

awk-vrs='.''RT{
gsub（/[：space:]*/，“”，RT）；print RT}file.xml

如果您想提取标记中的字符串。

好主意，但是这个字符串可能包含比字符串最大值更多的符号，因此xml解析器无法处理它。很遗憾，您问题中的xml无效-我怀疑这只是您为我们准备示例的结果。如果它是有效的，那么正如@123所建议的，您应该真正使用像

xmllint--xpath'//reportBody/text（）'file.xml

awk -v RS='<reportBody>.*</reportBody>' 'RT{
     gsub(/<\/?reportBody>[[:space:]]*/, "", RT); print RT}' file.xml