Shell 在Unix中打印两行之间的文本（来自文件中的行号列表）_Shell_Unix_Sed_Awk

Shell 在Unix中打印两行之间的文本（来自文件中的行号列表）

shell unix sed awk

Shell 在Unix中打印两行之间的文本（来自文件中的行号列表）,shell,unix,sed,awk,Shell,Unix,Sed,Awk,我有一个有数千行的示例文件。我想打印该文件中两个行号之间的文本。我不想手动输入行号，而是有一个包含行号列表的文件，行号之间必须打印文本示例：linenumbers.txt 345|789 999|1056 1522|1366 3523|3562 我需要一个shell脚本，它将从这个文件中读取行号，并将每行之间的文本打印到一个单独的（新）文件中也就是说，它应该将345和789之间的行打印到一个新文件中，比如说File1.txt，并将999和1056之间的文本打印到一个新文件中，比如说Fil

我有一个有数千行的示例文件。我想打印该文件中两个行号之间的文本。我不想手动输入行号，而是有一个包含行号列表的文件，行号之间必须打印文本

示例：

linenumbers.txt

我需要一个shell脚本，它将从这个文件中读取行号，并将每行之间的文本打印到一个单独的（新）文件中

也就是说，它应该将345和789之间的行打印到一个新文件中，比如说

File1.txt

，并将999和1056之间的文本打印到一个新文件中，比如说

File2.txt

，依此类推。

要从

345 | 789

提取第一个字段，您可以使用awk

awk -F'|' '{print $1}'

结合从其他问题得到的答案，您将得到一个解决方案。

考虑到您的目标文件只有数千行。这里有一个快速而肮脏的解决方案

awk -F'|' '{system("sed -n \""$1","$2"p\" targetFile > file"NR)}' linenumbers.txt

```
targetFile
```
是包含数千行的文件
oneliner不要求对您的
```
linenumbers.txt
```
进行排序
oneliner允许在
```
linenumbers.txt中重叠行范围
```


运行上述命令后，您将拥有n个filex
文件n
是行号的行数。txt
x
来自1-n
您可以根据需要更改文件名模式。
您可以执行以下操作
# myscript.sh
linenumbers="linenumber.txt"
somefile="afile"
while IFS=\| read start  end ; do
    echo "sed -n '$start,${end}p;${end}q;' $somefile  > $somefile-$start-$end"
done < $linenumbers

然后当你高兴的时候，做sh myscript.sh | sh

EDIT添加了威廉在风格和正确性方面的优秀观点
编辑说明
其基本思想是获取一个脚本来生成一系列shell命令，这些命令在被“|sh”执行之前可以首先检查其正确性
sed-n'345789p；789q表示使用sed
且不回显每一行（-n）；有两条命令从第345行到789 p（rint）行，第二条命令在第789 q（uit）行-在保存的最后一行退出，读取所有输入文件
while
循环使用read从$linenumbers文件中读取，read
如果给定多个变量名，则每个变量名都会用输入中的字段填充，字段通常用空格分隔，如果变量名太少，则read
会将剩余数据放入最后一个变量名中
您可以在shell提示符下输入以下内容来理解该行为
ls -l | while read first rest ; do
   echo $first XXXX $rest
done

尝试向上面添加另一个变量second
，看看接下来会发生什么，这应该是显而易见的
问题是数据是用| s分隔的，这就是使用William的建议的IFS=\\\\\\\\
的工作原理，当从输入中读取时，数据已经更改，输入现在被| s分隔，我们得到了期望的结果
其他人可以自由编辑、更正和扩展。
我会使用sed
处理示例数据文件，因为它简单快捷。这需要将行号文件转换为相应的sed
脚本的机制。有很多方法可以做到这一点
一种方法是使用sed
将行号集转换为sed
脚本。如果一切都是标准输出，这将是微不足道的。由于输出需要转到不同的文件，我们需要为行号文件中的每一行指定行号。给出行号的一种方法是命令。另一种可能是使用。相同的sed
命令行适用于以下两种情况：
nl linenumbers.txt |
sed 's/ *\([0-9]*\)[^0-9]*\([0-9]*\)|\([0-9]*\)/\2,\3w file\1.txt/'

对于给定的数据文件，它将生成：
345,789w > file1.txt
999,1056w > file2.txt
1522,1366w > file3.txt
3523,3562w > file4.txt

另一个选项是让awk
生成sed
脚本：
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt

如果您的sed
版本允许您使用-f-
从标准输入读取其脚本（GNUsed
有；BSDsed
没有），则您可以动态地将行号文件转换为sed
脚本，并使用该脚本解析示例数据：
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f - sample.data

如果您的系统支持/dev/stdin
，则可以使用以下选项之一：
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/stdin sample.data

awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt |
sed -n -f /dev/fd/0 sample.data

否则，请使用显式脚本文件：
awk -F'|' '{ printf "%d,%dw > file%d.txt\n", $1, $2, NR }' linenumbers.txt > sed.script
sed -n -f sed.script sample.data
rm -f sed.script

严格地说，您应该确保临时文件名是唯一的（mktemp
），并且即使脚本被中断（trap
）也会被删除：
最后的陷阱0
允许您的脚本成功退出；忽略它，脚本将始终以状态1退出
我忽略了Perl和Python；这两种方法都可以在单个命令中使用。文件管理非常精细，因此使用sed
似乎更简单。您也可以只使用awk
，或者使用第一个awk
脚本编写awk
脚本来完成繁重的工作（上面的扩展很小），或者让一个awk
进程读取两个文件并生成所需的输出（更难，但远不是不可能）
如果没有别的，这表明有很多可能的方法做这项工作。如果这是一个一次性的练习，你选择哪一个其实并不重要。如果您要重复执行此操作，请选择您喜欢的机制。如果您担心性能，请测量。很可能，将行号转换为命令脚本的成本可以忽略不计；使用命令脚本处理示例数据需要花费时间。我希望sed能在这一点上出类拔萃；我还没有测量以确认它是否正确。
这里有一种使用GNU awk
的方法。运行方式如下：
awk -f script.awk numbers.txt file.txt

awk -f script.awk numbers.txt file.txt

script.awk的内容
：
BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as keys to a multidimensional array with
    # a value of field two
    a[NR][$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array's first dimension
    for (i in a) {

        # for every element in the second dimension
        for (j in a[i]) {

            # ensure that the first field is treated numerically
            j+=0

            # if the line number is greater than the first field
            # and smaller than the second field
            if (FNR>=j && FNR<=a[i][j]) {

                # print the line to a file with the suffix of the first file's 
                # line number (the first dimension)
                print > "File" i
            }
        }
    }
}

BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as a key to a pseudo-multidimensional
    # array with a value of field two
    a[NR,$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array
    for (i in a) {

        # split the element in to another array
        # b[1] is the row number and b[2] is the first field 
        split(i,b,SUBSEP)

        # if the line number is greater than the first field
        # and smaller than the second field
        if (FNR>=b[2] && FNR<=a[i]) {

            # print the line to a file with the suffix of the first file's
            # line number (the first pseudo-dimension)
            print > "File" b[1]
        }
    }
}

script.awk的内容
：
BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as keys to a multidimensional array with
    # a value of field two
    a[NR][$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array's first dimension
    for (i in a) {

        # for every element in the second dimension
        for (j in a[i]) {

            # ensure that the first field is treated numerically
            j+=0

            # if the line number is greater than the first field
            # and smaller than the second field
            if (FNR>=j && FNR<=a[i][j]) {

                # print the line to a file with the suffix of the first file's 
                # line number (the first dimension)
                print > "File" i
            }
        }
    }
}

BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as a key to a pseudo-multidimensional
    # array with a value of field two
    a[NR,$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array
    for (i in a) {

        # split the element in to another array
        # b[1] is the row number and b[2] is the first field 
        split(i,b,SUBSEP)

        # if the line number is greater than the first field
        # and smaller than the second field
        if (FNR>=b[2] && FNR<=a[i]) {

            # print the line to a file with the suffix of the first file's
            # line number (the first pseudo-dimension)
            print > "File" b[1]
        }
    }
}

开始{
#设置字段分隔符
FS=“|”
}
#用于参数列表中的第一个文件
FNR==NR{
#将行号和字段1作为键添加到伪多维
BEGIN {
    # set the field separator
    FS="|"
}

# for the first file in the arguments list
FNR==NR {

    # add the row number and field one as a key to a pseudo-multidimensional
    # array with a value of field two
    a[NR,$1]=$2

    # skip processing the rest of the code
    next
}

# for the second file in the arguments list
{
    # for every element in the array
    for (i in a) {

        # split the element in to another array
        # b[1] is the row number and b[2] is the first field 
        split(i,b,SUBSEP)

        # if the line number is greater than the first field
        # and smaller than the second field
        if (FNR>=b[2] && FNR<=a[i]) {

            # print the line to a file with the suffix of the first file's
            # line number (the first pseudo-dimension)
            print > "File" b[1]
        }
    }
}

awk -F "|" 'FNR==NR { a[NR,$1]=$2; next } { for (i in a) { split(i,b,SUBSEP); if (FNR>=b[2] && FNR<=a[i]) print > "File" b[1] } }' numbers.txt file.txt

sed -r 's/(.*)\|(.*)/\1,\2w file-\1-\2.txt/' | sed -nf - file