Merge 如何组合像';diff——统一';是吗?

Merge 如何组合像';diff——统一';是吗?,merge,diff,Merge,Diff,我正在寻找一种解决方案,它将两个或多个输入文件合并成一个输出文件。它的工作方式与“diff-u999999file1.txt file2.txt>output.txt”非常相似,但没有diff指示器 这是我不久前为合并一组日志文件而编写的脚本。我首先开始手动使用kdiff3,它对小文件很有效,但随着累积的日志越来越大,速度变得非常慢,令人痛苦 我们的日志包含定期出现的printf结果(“time(NULL)=%d\n”,time(NULL)),您必须调整以找到其他单调递增的同步标记 #!/usr

我正在寻找一种解决方案,它将两个或多个输入文件合并成一个输出文件。它的工作方式与“diff-u999999file1.txt file2.txt>output.txt”非常相似,但没有diff指示器

这是我不久前为合并一组日志文件而编写的脚本。我首先开始手动使用kdiff3,它对小文件很有效,但随着累积的日志越来越大,速度变得非常慢,令人痛苦

我们的日志包含定期出现的
printf结果(“time(NULL)=%d\n”,time(NULL)),您必须调整以找到其他单调递增的同步标记

#!/usr/bin/perl 
use strict;
use warnings;

# This program takes two overlapping log files and combines
# them into one, e.g.
#
#          INPUT:                    OUTPUT:
#
#   file1        file2              combined
#    AAA                               AAA
#    AAA                               AAA
#    AAA                               AAA
#    BBB          BBB                  BBB
#    BBB          BBB                  BBB
#    BBB          BBB                  BBB
#                 CCC                  CCC
#                 CCC                  CCC
#                 CCC                  CCC
#                 CCC                  CCC
#

# This programm uses the "time(NULL) = <...time...>" lines in the
# logs to match where the logs start overlapping.

# Example line matched with this function:
# time(NULL) = 1388772638
sub get_first_time_NULL {
    my $filename = shift;
    my $ret = undef;
    open(FILE, $filename);
    while (my $line = <FILE>) {
        if ($line =~ /^time\(NULL\) = (\d+)/) {
            $ret = $1;
            last;
        }
    }
    close(FILE);
    return $ret;
}

my $F1_first_time = get_first_time_NULL($ARGV[0]);
my $F2_first_time = get_first_time_NULL($ARGV[1]);

my $oldest_file;
my $newest_file;
my $newest_file_first_time;

if ($F1_first_time <= $F2_first_time) {
    $oldest_file = $ARGV[0];
    $newest_file = $ARGV[1];
    $newest_file_first_time = $F2_first_time;
} else {
    $oldest_file = $ARGV[1];
    $newest_file = $ARGV[0];
    $newest_file_first_time = $F1_first_time;
}

# Print the "AAA" part
open(FILE, $oldest_file);
while (my $line = <FILE>) {
    print $line;
    last if ($line =~ /^time\(NULL\) = $newest_file_first_time/);
}
close(FILE);

# Print the "BBB" and "CCC" parts
my $do_print = 0;
open(FILE, $newest_file);
while (my $line = <FILE>) {
    print $line if $do_print;
    $do_print = 1 if ($line =~ /^time\(NULL\) = $newest_file_first_time/);
}
close(FILE);

file1.txt和file2.txt是否重叠?(例如,file1.txt首先包含uniqe,然后是与file2.txt相同的内容,而file2.txt以公共部分开头,然后是唯一的内容)。我问的原因是,如果是这样,那么我想我有一个解决方案。实际上,我想合并两个日志文件,其中可能有重叠的部分,应该出现一次。
#!/bin/sh

# This script combines several overlapping logfiles into one
# continous one. See merge_log_files.pl for more details into
# how the logs are merged, this script is only glue to process
# multiple files in one operation.

set -e

MERGE_RESULT="$1"
shift

echo "Processing $1..."
cp "$1" MeRgE.TeMp.1
shift

while [ -n "$1" ]
do
    if [ ! -s "$1" ]
    then
        echo "Skipping empty file $1..."
        shift
        continue
    fi
    echo "Processing $1..."
    perl `echo $0 | sed 's/\.sh$/.pl/'` MeRgE.TeMp.1 "$1" > MeRgE.TeMp.2 && mv MeRgE.TeMp.2 MeRgE.TeMp.1
    shift;
done

mv MeRgE.TeMp.1 $MERGE_RESULT
echo "Done"