Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/bash/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Linux 使用Sed从日志文件中提取XML内容,并将每个结果转储到不同的文件中_Linux_Bash_Sed - Fatal编程技术网

Linux 使用Sed从日志文件中提取XML内容,并将每个结果转储到不同的文件中

Linux 使用Sed从日志文件中提取XML内容,并将每个结果转储到不同的文件中,linux,bash,sed,Linux,Bash,Sed,我有以下10GB的日志文件,需要直接在Unix服务器上进行分析 2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message1 2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message2 2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message3 2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some mes


2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message1
2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message2
2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message3
2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message4
2017-12-12 13:04:28,716 [ABC] [DEF] DEBUG some message5
2017-12-12 13:04:28,732 [ABC] [DEF] DEBUG some message6
2017-12-12 13:04:28,732 [ABC] [DEF] DEBUG <xml>
<!—- id is not unique since the XML data provides all the
information of an object X defined by its id at a specific point in time -->
some XML content on more than 500 lines
2017-12-12 13:04:30,330 [ABC] [DEF] DEBUG some message8
2017-12-12 13:04:30,333 [ABC] [DEF] DEBUG some message9
2017-12-12 13:04:30,334 [ABC] [DEF] INFO some message10
2017-12-12 13:04:30,334 [ABC] [DEF] INFO some message11
2017-12-12 13:04:31,431 [ABC] [DEF] INFO some message12
2017-12-12 13:04:28,732 [ABC] [DEF] DEBUG <xml>
some XML content on more than 500 lines 
2017-12-12 13:04:31,432 [ABC] [DEF] DEBUG some message13
2017-12-12 13:04:31,476 [ABC] [DEF] INFO some message14
2017-12-12 13:04:31,476 [ABC] [DEF] DEBUG some message14
2017-12-12 13:04:31,490 [ABC] [DEF] DEBUG some message15
2017-12-12 13:04:28,732 [ABC] [DEF] DEBUG <xml>
some XML content on more than 500 lines 
2017-12-12 13:04:31,491 [ABC] [DEF] DEBUG some message16
2017-12-12 13:04:31,491 [ABC] [DEF] DEBUG some message17
2017-12-12 13:04:31,496 [ABC] [DEF] DEBUG some message18
2017-12-12 13:04:31,996 [ABC] [DEF] INFO some message19

sed -r 's~(<xml>…<id>(.*)</id>…</xml>)~echo "\1" >> \2.out~e' file.in #just a prototype


首先在日志中标识起始标记,并将测试变量更改为yes 将XML的每一行存储在一个缓冲区变量中,然后在我获得数据后立即将其转储到一个文件$i.out中,当然,还要将测试变量重置为no。 如果您有一个使用awk的更好的解决方案,或者一个使用sed的解决方案,其中我可以访问一个包含当前正在处理的模式数量的变量,并重用它来生成输出文件,那就太好了。类似这样的内容:用于生成文件\u$current\u pattern\u position.out的current\u pattern\u position



perl -ne 'if(s/.*(?=<xml>)//){$x++;open$fh,">file$x.xml"}if($fh){print$fh $_}if(/<\/xml>/){close$fh;undef$fh}' input.txt



GNU Awk解决方案:

awk -v RS='<xml>|</xml>' '!(NR%2){ 
           gsub(/^[[:space:]]*|[[:space:]]*$/, ""); 
           printf "<xml>\n%s\n</xml>\n",$0 > "file"++c".xml";
       }' file

$ head file*.xml
==> file1.xml <==
<!—- id is not unique since the xml data provides all the
information of an object X defined by its id at a specific point in time -->
some xml content on more than 500 lines

==> file2.xml <==
some xml content on more than 500 lines

==> file3.xml <==
some xml content on more than 500 lines


#!/bin/sed -nf

# Execute the following group of commands for each line in the XML node to
# generate a series of shell commands that we'll feed into an interpreter:
/<xml>/,/<\/xml>/ {
    # Extract the ID number to generate a command that changes the output file:
    /^<id>\([0-9]\+\)<\/id>$/ {
        # Using the same pattern as above, substitute the ID number into a
        # command that updates the current output file and increments a counter
        # for the ID that we'll append as the filename extension:
        s//c\1=$(( c\1 + 1 )); exec > "file\1.$c\1"/
        # Output the generated command:
        # Then, proceed to the next line:
    # Output any remaining lines in the XML block except for the <xml> tags:
    /<xml>\|<\/xml>/ !{
        # Escape any single quotes in the XML content (so we can wrap it in a
        # shell command below):
        #'# (...ignore or remove this line...)
        # Generate a command that will write the line to the current file:
        s/^.*$/echo '&'/
        # Output the generated command:
它是有效的,但是我们可以看出Sed并不是解决这个问题的最佳工具。Sed的simple语言不是为这种逻辑设计的,因此代码并不美观,我们依赖shell生成文件,这增加了一点开销。如果您很难使用Sed,那么工作可能需要更长的时间。对于性能关键的问题,考虑使用其他答案中描述的工具之一。 根据问题中的信息和示例,我假设我们不希望在输出中使用开始和结束标记,并且ID在它自己的行中总是一个数字。实现使用数字扩展名写入文件名,当发现重复的ID fileID.count、file1.1、file1.2等时,该扩展名将递增。。如果需要,更改这些细节应该很容易

注意:如果需要,修订历史记录包含一个使用GNU Sed的版本,另一个使用为简洁起见我删除的包装器脚本。它们可以工作,但速度太慢或太复杂。

谢谢您的回答,但这正是我在文章中提出的建议:-提取每个xml消息并将其转储到单独的文件中。这里没有缓冲。但是,是的,这通常是正确的方法。@Allan为什么要在不需要时使用内存缓冲?谢谢,我添加了修改。sub/gsub作为条件很好。顺便问一下,你认为有可能用sed实现吗?这会累积打开的文件描述符,在1020年左右的时间后可能会出现问题。请参阅ulimit-n unique id-a closefilec.xml将解决此问题。感谢您的回答,顺便问一下,您认为可以用sed来实现吗?sed不是设计用来写入多个文件的,显然可以用gnu-sed使用echo+>>来实现,但它可能不是那么理想,因为为每行打开文件,awk也是如此。我认为perl更合适,正如问题所述,使用任何unix命令/工具的任何其他性感解决方案当然都是受欢迎的。我不确定您对awk的相同含义-awk在使用输出文件方面的效率不亚于perl,而且它肯定不会每行打开一个输出文件。实际上,查看系统调用可以查看所有输出使用awk解决方案,文件保持打开状态,而在perl中,它们可以被明确地关闭。请明确这一点,这样就不会有人浪费时间发布您不想要的内容-您真的想要一个只使用一个sed脚本的解决方案,就像您目前为止拥有等效的awk和perl解决方案一样,还是想要一个可能使用多个sed的bash解决方案电话和其他工具?如果只是sed,它应该是可移植的,还是可以特定于一个sed变体,例如GNU sed?理想情况下是1个sed脚本解决方案,如果不可能,可以使用多个调用。GNU sed很好:谢谢你的帮助!如果您需要更多信息,请告诉我,除非您想对GNU sed的e命令进行一些修改,这最终归结为将shell脚本塞进sed,否则我认为您不能使用单个sed调用写入多个不同的文件
awk 'sub(/.*<xml>/,"<xml>") {out="file" ++i ".xml"; p=1}
     p {print > out}
     /<\/xml>/ {p=0; close(out)}
' file
#!/bin/sed -nf

# Execute the following group of commands for each line in the XML node to
# generate a series of shell commands that we'll feed into an interpreter:
/<xml>/,/<\/xml>/ {
    # Extract the ID number to generate a command that changes the output file:
    /^<id>\([0-9]\+\)<\/id>$/ {
        # Using the same pattern as above, substitute the ID number into a
        # command that updates the current output file and increments a counter
        # for the ID that we'll append as the filename extension:
        s//c\1=$(( c\1 + 1 )); exec > "file\1.$c\1"/
        # Output the generated command:
        # Then, proceed to the next line:
    # Output any remaining lines in the XML block except for the <xml> tags:
    /<xml>\|<\/xml>/ !{
        # Escape any single quotes in the XML content (so we can wrap it in a
        # shell command below):
        #'# (...ignore or remove this line...)
        # Generate a command that will write the line to the current file:
        s/^.*$/echo '&'/
        # Output the generated command:
$ sed -nf parse_log.sed < file.in | sh
sed -n '/<xml>/,/<\/xml>/ {                             
    /^<id>\([0-9]\+\)<\/id>$/{s//c\1=$(( c\1 + 1 ));exec > "file\1.$c\1"/;p;n;}
    /<xml>\|<\/xml>/!{'"s/'/'\"'\"'/g;"'s/^.*$/echo '"'&'"'/;p;}                
}' < file.in | sh