使用sed生成多个文件，这些文件的名称取决于搜索的正则表达式模式_Sed

使用sed生成多个文件，这些文件的名称取决于搜索的正则表达式模式

sed

使用sed生成多个文件，这些文件的名称取决于搜索的正则表达式模式,sed,Sed,基本上，我有一个顺序错误的文件，我想重建。常见的是一个节模式P，在正则表达式中是^\d\{1,2}\\.\d\{1,2}\.\d\{1,2}\.，即：1.2.3.，下面是一些文本我想要的是将这些截面块输出到单独的文件中，以便按顺序重建它们到目前为止，我的策略（因为我还不能处理多行正则表达式）是将这些模式P替换为前面有6个字符的相同模式（作为标记）然后用其他的标记模式替换所有的换行符，比如说### 然后（这就是我遇到问题的地方），使用sed，搜索任何形式的&&&P[任何字符]&&&&&&，

基本上，我有一个顺序错误的文件，我想重建。常见的是一个节模式P，在正则表达式中是

^\d\{1,2}\\.\d\{1,2}\.\d\{1,2}\.

，即：

1.2.3.

，下面是一些文本

我想要的是将这些截面块输出到单独的文件中，以便按顺序重建它们

到目前为止，我的策略（因为我还不能处理多行正则表达式）是

将这些模式P替换为前面有6个字符的相同模式（作为标记）

然后用其他的标记模式替换所有的换行符，比如说

###

然后（这就是我遇到问题的地方），使用

sed

，搜索任何形式的

&&&P[任何字符]&&&&&&

，并将其输出到一个附加了值为P的数字的文件（即

文件1.2.3

）

将这些文件按正确顺序合并回一个文件（不太确定如何执行此操作，但这并不是阻止我的原因）

移除

标记，并用新行替换

###

我意识到这可能是一种低效的方法，但就我所掌握的知识而言，如果不是第（3）步或者第（4）步，我相信我至少会实现我的目标

至于（3），我尝试了以下的衍生工具：

sed s/\\（\&\&\&\&\）\（^\d\{1,2}\.\d\{1,2}\.\\\.\\\\）\（.\&&&&&/\\）/\1\2\3/file\2

我试图使用regex模式

\2

作为我的新文件的扩展名；而且，好吧，它根本不起作用

注意：我使用6

，因此我不会选择

形式的模式

任何帮助都将不胜感激

我想你可能对sed要求太多了。您的方法可能有效，但perl为这项工作提供了一些工具：

while ( $line = <> ) {
    if( $line =~ /\d{1,2}.\d{1,2}.\d{1,2}/ ) {
        $section = $1;
        open( $SECTION, ">>", "out.$section.txt");
        print $SECTION $line;
        close $SECTION;
    }
}

while（$line=）{
如果（$line=~/\d{1,2}.\d{1,2}.\d{1,2}/）{
$section=$1；
打开（$SECTION，“>>”，“out.$SECTION.txt”）；
打印$SECTION$行；
关闭$SECTION；
}
}

这是蛮力法。。。我在while循环中打开和关闭文件句柄，这是非常低效的。对于要在少于10000行的文件上运行几次的内容，这就足够了。请注意，此解决方案会将数据附加到每个文件中，因此如果要再次运行它，则必须清除所有文件

最好创建所有可能输出文件名的哈希，然后为每个文件名创建一个行数组。这些文件可以一个文件一个文件地整理和写出来

我认为可以公平地说，

sed

不是这项任务的正确工具。只要付出足够的努力，或许可以让它这样做，但让它这样做确实不公平

Perl（或Python）是一个合理的替代方案。我在Perl方面比Python更流利，所以我会使用它

此外，使用Perl，您甚至可能不需要将输出发送到多个文件，除非文档大小为数百兆字节

我读了一些字里行间的内容，但我认为您的文档输入格式如下：

2.1.9
...multiple lines of material for section 2.1.9...
1.3.6
...multiple lines of material for section 1.3.6...
9.1.3
...multiple lines of material for section 9.1.3...

各节未按顺序呈现的情况。对于我的建议来说，分区标记单独在一行上并不重要；如果在同一行上有文本，它会略微改变内容

在大纲中，代码应该如下所示：

my $current_section = "0.0.0";
my %section_list = ();
my $section_material = "";

while (<>)
{
    if (m/^(\d+\.\d+\.\d+)/)
    {
        # Found a new section...stash the old one...
        if ($section_material ne "")
        {
            # If the same section number appears twice, simply concatenate
            # the new material over the old.  Or you can get more complex,
            # using an array of refs to section material...
            $section_list{$current_section} = ""
                if !defined $section_list{$current_section}; 
            $section_list{$current_section} .= $section_material;
            $current_section = $1;
            $section_material = "";
        }
    }
    $section_material .= $_;
}
if ($section_material ne "")
{
    $section_list{$current_section} = ""
        if !defined $section_list{$current_section}; 
    $section_list{$current_section} .= $section_material;
}

# Now the hash %section_list contains all the material.
# You need a section number comparison function that can be used with sort
sub section_cmp
{
    ...if $a comes before $b...return -1
    ...if $b comes before $a...return +1
    ...otherwise...............return 0
}

foreach my $section (sort section_cmp keys %section_list)
{
     print "[$section]\n";
     print "$section_list{$section}\n";
}

您可以查找v-numbers，或者找到一个“版本比较”模块，它可以更快地完成这项工作。

awk

可以非常优雅地完成这项工作：

#!/usr/bin/awk

# Put anything before the first section somewhere so we don't lose it.
BEGIN { section = "pre" }

# When we hit a new section, change to that section. Print the section to a file, for sorting later.
/^([0-9]{1,2}\.){3}/ { print (section=$0) >> "sections" }

# Print the line into the current working file
{ print >> section }

现在，在运行此命令之后，每个节都位于其自己的文件中，以该节命名。让我们把它们结合起来

# print the preamble if there was any
[ -f pre ] && cat pre > full

# sort has a -V option to sort version numbers, which is what you want.
sort -V sections | while read file; do cat "$file" >> full; done

就这样。您已经有了完整的文件，按节排序，所有前言仍在顶部。

非常感谢您的回复Barton，我还没有完全理解您代码的更详细的内容（因为我对perl真的是新手（目前是）），但我明天会尝试一下（为了测试目的，我会简化文件）让你知道我是怎么做的。关于你提到的效率低下，我说这个算法是n^2阶（对于n行数）对吗？你的另一个解决方案看起来不错，但再一次，在我可以实现它之前，我似乎还有一些需要学习的地方。不，应该是O（N）。。。只是打开和关闭文件句柄可能会很慢，因为每次打开和关闭文件句柄时，您（可能）都会碰到磁盘。我编辑了这个问题以添加代码标记。我建议你们提供一些输入样本和预期的输出，因为我很难猜测你们想要实现什么。

# print the preamble if there was any
[ -f pre ] && cat pre > full

# sort has a -V option to sort version numbers, which is what you want.
sort -V sections | while read file; do cat "$file" >> full; done