Linux：在文件中添加带有字符和递增数字的新行_Linux_Sed

Linux：在文件中添加带有字符和递增数字的新行

linux sed

Linux：在文件中添加带有字符和递增数字的新行,linux,sed,Linux,Sed,我有一个包含许多行（约4000万行）的文件，我正试图将其拆分，以便在一些下游流程中使用。文件看起来像这样 a b c d e 我想通过每隔1M行在新行中添加字符串'>n'来分解文件。出于这些目的，2行示例就可以了。我希望我的最终输出是 a b >1 c d >2 e 我很确定sed可以完成这一点，但我无法设法解决不断增加的数字部分。我不认为sed可以自己完成这一切，因为（AFAIK）它不能处理变量，但awk可以。您可以使用以下脚本 BEGIN { id=0; }

我有一个包含许多行（约4000万行）的文件，我正试图将其拆分，以便在一些下游流程中使用。文件看起来像这样

a
b
c
d
e

我想通过每隔1M行在新行中添加字符串'>n'来分解文件。出于这些目的，2行示例就可以了。我希望我的最终输出是

a
b
>1
c
d 
>2
e

我很确定sed可以完成这一点，但我无法设法解决不断增加的数字部分。

我不认为sed可以自己完成这一切，因为（AFAIK）它不能处理变量，但awk可以。您可以使用以下脚本

BEGIN {
    id=0;    
}

{
    if (NR % nth == 0) {
        print ">"id;
        id++;    
    } else {
        print $0
    }
}

END {}

然后按如下方式执行：

awk -v nth=<your N value> -f /script/name > /new/file

awk-vnth=-f/script/name>/new/file

我不认为sed可以独自完成这一切，因为（AFAIK）它不能处理变量，但awk可以。您可以使用以下脚本

BEGIN {
    id=0;    
}

{
    if (NR % nth == 0) {
        print ">"id;
        id++;    
    } else {
        print $0
    }
}

END {}

然后按如下方式执行：

awk -v nth=<your N value> -f /script/name > /new/file

awk-vnth=-f/script/name>/new/file

awk是这里更好的选择

这一条插入了你想要的线条

awk 'BEGIN{i=0}; {if ((NR-1) % 1000000 == 0) {i++; print ">" i}}; {print}' your_file > another_file

这一个将文件“your_file”直接拆分为名为“your_file1”、“your_file2”等文件

awk 'BEGIN{i=0}; {if ((NR-1) % 1000000 == 0) {i++}} {print > "your_file" i}' your_file

awk是这里更好的选择

这一条插入了你想要的线条

awk 'BEGIN{i=0}; {if ((NR-1) % 1000000 == 0) {i++; print ">" i}}; {print}' your_file > another_file

这一个将文件“your_file”直接拆分为名为“your_file1”、“your_file2”等文件

awk 'BEGIN{i=0}; {if ((NR-1) % 1000000 == 0) {i++}} {print > "your_file" i}' your_file

@斯蒂芬：试试看：

awk -v num=2 'FNR % num == 0 {print $0 ORS ">"++q ;next} 1'  Input_file

同样，您可以在上面提供行号，然后在输出中打印它。此外，我还将其作为FNR，以查找行数，以防用户使用多个输入_文件，因此每次出现下一个文件时，它都会重置FNR的值，并从头开始创建下一个输入_文件（NR不这样做）

编辑：现在也添加完整的代码解释

awk -v num=2           #### Setting a variable named num to value 2 here.
'FNR % num == 0        #### Checking condition if FNR%num==0 is TRUE then it should perform following actions. Where FNR is awk built-in keyword to get the line number, only difference between FNR and NR is FNR gets RESET whenever a new Input_file gets read. As we know awk could read multiple Input_files, so
                            in this case FNR could be really helpful compare to NR.
{print $0 ORS ">"++q ; #### printing the current line's value(off course when above condition is TRUE) with ORS(output field separator) whose default value is new line and then printing ">" and a variable named q whose value will always increase each time cursor comes in this section.
next}                  #### mentioning next keyword here which will help us to skip all other further statements when this condition met so that we could save our time.
1                      #### awk works on condition then action pattern so here by putting 1 I am making condition as TRUE and then specifying no action so by default print will happen which will print the entire line.
'  Input_file          #### mentioning the Input_file here.

@斯蒂芬：试试看：

awk -v num=2 'FNR % num == 0 {print $0 ORS ">"++q ;next} 1'  Input_file

编辑：现在也添加完整的代码解释

awk -v num=2           #### Setting a variable named num to value 2 here.
'FNR % num == 0        #### Checking condition if FNR%num==0 is TRUE then it should perform following actions. Where FNR is awk built-in keyword to get the line number, only difference between FNR and NR is FNR gets RESET whenever a new Input_file gets read. As we know awk could read multiple Input_files, so
                            in this case FNR could be really helpful compare to NR.
{print $0 ORS ">"++q ; #### printing the current line's value(off course when above condition is TRUE) with ORS(output field separator) whose default value is new line and then printing ">" and a variable named q whose value will always increase each time cursor comes in this section.
next}                  #### mentioning next keyword here which will help us to skip all other further statements when this condition met so that we could save our time.
1                      #### awk works on condition then action pattern so here by putting 1 I am making condition as TRUE and then specifying no action so by default print will happen which will print the entire line.
'  Input_file          #### mentioning the Input_file here.

我将使用一个简单的shell脚本（upline.sh）来实现这一点：

从

bash upline.sh < youtdatafile.txt

要将“分割数”作为参数，我需要使用一个简单的shell脚本（upline.sh）：

从

bash upline.sh < youtdatafile.txt

要将“分割数”作为参数，这可能适用于您（GNU-sed）：

这将使用

seq

生成一系列您认为必要的文件分割器，然后使用~step中的模运算

将它们插入到输入文件中
另一种完全sed但不推荐的方法是：
sed -r '0~1000000!b;p;x;s/^9*$/0&/;:a;s/9(x*)$/x\1/;ta;s/$/#0123456789/;s/(.)(x*)#.*\1(.).*/\3\2/;s/x/0/g;h;s/^/>/' file

这将使用相同的模运算，然后在保持空间中保留一个计数器，并在将其插入输出文件之前将其递增
不过。由于此练习的真正目的是将大型文件拆分为较小的文件，为什么不直接使用split

split -a1 --numeric-suffixes=1 -l 1000000  file '>'

这会将文件拆分为名为>1
的文件。>n
百万行中的每一行。
这可能适用于您（GNU-sed）：
这将使用seq
生成一系列您认为必要的文件分割器，然后使用~step

中的模运算

将它们插入到输入文件中
另一种完全sed但不推荐的方法是：
sed -r '0~1000000!b;p;x;s/^9*$/0&/;:a;s/9(x*)$/x\1/;ta;s/$/#0123456789/;s/(.)(x*)#.*\1(.).*/\3\2/;s/x/0/g;h;s/^/>/' file

这将使用相同的模运算，然后在保持空间中保留一个计数器，并在将其插入输出文件之前将其递增
不过。由于此练习的真正目的是将大型文件拆分为较小的文件，为什么不直接使用split

split -a1 --numeric-suffixes=1 -l 1000000  file '>'

这将文件拆分为名为>1
的文件。>n
每一百万行。
我想拆分文件-将文件拆分为多个文件，或者只在每一百万行后插入'>n'
？只需在每1M行后插入>n即可。在我使用的软件中，下游“>”表示拆分。也许“拆分”文件是一个更好的术语AWK比sed更好的选择…如果您试图拆分FASTA文件，解释您的确切要求可能会帮助您更快地实现实际的最终目标。我想拆分文件-将文件拆分为多个文件，或者在每百万行后插入'>n'
？在每一百万行后插入>n即可。在我使用的软件中，下游“>”表示拆分。也许“分解”文件是一个更好的术语AWK将是比sed更好的选择…如果您试图分割FASTA文件，解释您的确切要求可能会帮助您更快地实现实际的最终目标。id=1
以匹配OP的预期输出。。。当NR%n==0
时，您还需要打印$0
。。或者一直打印，然后检查条件。。。可以简化为awk-vnth=2'{print}NR%nth==0{print”>“++c}”文件
这不是他想要的，这将用“>”@putonspectacles替换第n行否，它用“>id”替换它。id=1
以匹配OP的预期输出。。。当NR%n==0
时，您还需要打印$0
。。或者一直打印，然后检查条件。。。可以简化为awk-vnth=2'{print}NR%nth==0{print”>“++c}”文件
这不是他想要的，这将用“>”@putonspectacles替换第n行否，用“>id”替换它。您也可以在您的条件中添加&&NR>1，否则它也会占用第一行。您也可以在您的条件中添加&&NR>1，否则它也会占用第一行。如果解释了这些命令，这个答案会更好。例如，-v var=num
在程序开始之前将变量num分配给var。解释FNR的功能，我想我只是在您评论的同时添加了解释：）如果解释了命令，这个答案会更好。例如，-v var=num
在程序开始之前将变量num分配给var