在linux中，根据文件名更改文件中的特定数字_Linux_Awk_Sed

在linux中，根据文件名更改文件中的特定数字

linux awk sed

在linux中，根据文件名更改文件中的特定数字,linux,awk,sed,Linux,Awk,Sed,我有一个名为part2.txt的输入文件，下面输入了数千行，如 46742 1 48276 48343 48199 48198 46744 1 48343 48344 48200 48199 46746 1 48344 48332 48201 48200 48283 3.58077402e+01 -2.97697746e+00 1.50878647e+02 48282 3.67

我有一个名为

part2.txt

的输入文件，下面输入了数千行，如

   46742       1   48276   48343   48199   48198
   46744       1   48343   48344   48200   48199
   46746       1   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

我必须将第二列中的所有整数更改为文件名中的数字（

part2.txt

），以便将所有整数

更改为

，而不是1，可能还有任何其他整数，它不只是3行，可能有数千行，它将变成：

   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

请注意，所有列均以空格分隔，并且在第一列的左侧还有一些空格。我曾尝试将其与

FNR

一起使用，但它没有那么健壮，并且要求在linux中使用sed或awk的某些方法。

您可以使用以下函数来使用

文件名

：

awk 'function name(file) {
        gsub(/[^0-9]*/, "", file)
        return file
     }
     {digits = name(FILENAME)}
     $2 ~ /^[0-9]*$/ {$2=digits}
     1' a2

我不明白的是为什么我不能调用

BEGIN{}

中的函数，我想是因为那时文件名还不可用。问题是，这意味着每次都要调用该函数。好吧，我们可以在计算后设置一个标志，但我将把它作为练习：）

更新：我不知道在编写函数之前我错过了什么，因为这样做很好：

awk '{digits = FILENAME; gsub(/[^0-9]*/, "", digits) } $2 ~ /^[0-9]*$/ {gsub(/\s$2\s/,digits)}1' a2.txt

为了防止每次计算

位数

，您可以使用

NR==1{}

技巧（归功于Wintermute的答案+1）

试验

您可以使用以下函数播放

FILENAME

：

awk 'function name(file) {
        gsub(/[^0-9]*/, "", file)
        return file
     }
     {digits = name(FILENAME)}
     $2 ~ /^[0-9]*$/ {$2=digits}
     1' a2

我不明白的是为什么我不能调用

BEGIN{}

更新：我不知道在编写函数之前我错过了什么，因为这样做很好：

awk '{digits = FILENAME; gsub(/[^0-9]*/, "", digits) } $2 ~ /^[0-9]*$/ {gsub(/\s$2\s/,digits)}1' a2.txt

为了防止每次计算

位数

，您可以使用

NR==1{}

技巧（归功于Wintermute的答案+1）

试验

这可以通过组合

sed

和shell变量来完成。这里有三个场景，每个场景都应该满足您的期望。此外，如果您想更改文件的位置，则可以使用

sed-i

而不是

sed

如果您知道文件的编号，那么这将是可行的，假设$n有文件编号（例如，part2.txt的n=2）：

否则，如果将扩展名为.txt的文件名存储在$f（例如.f=part2.txt）中，则应该可以：

f=part2.txt; n=$(sed 's:^\(.*[^0-9]\|\)\([0-9]\+\)\.txt:\2:' <<<"$f"); sed 's:^\(\s*[0-9]\+\s\+\)\([0-9]\+\)\(\s\):\1'"$n"'\3:' "$f"

这可以通过组合

sed

和shell变量来完成。这里有三个场景，每个场景都应该满足您的期望。此外，如果您想更改文件的位置，则可以使用

sed-i

而不是

sed

如果您知道文件的编号，那么这将是可行的，假设$n有文件编号（例如，part2.txt的n=2）：

否则，如果将扩展名为.txt的文件名存储在$f（例如.f=part2.txt）中，则应该可以：

f=part2.txt; n=$(sed 's:^\(.*[^0-9]\|\)\([0-9]\+\)\.txt:\2:' <<<"$f"); sed 's:^\(\s*[0-9]\+\s\+\)\([0-9]\+\)\(\s\):\1'"$n"'\3:' "$f"

使用gawk（对于

RT

），尽可能保持格式完整：

$ gawk -v RS='\\s+' 'NR == 1 { n = FILENAME; gsub(/[^0-9]/, "", n) } NR % 6 == 3 && int($0) == $0 { $0 = n } { printf $0 RT }' part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

使用

RS

作为

\s+

，每个字段都是一条记录，记录后面的空白将被记为

RT

，稍后我们将其用于打印。代码是

NR == 1 {                      # First record of the file:
  n = FILENAME                 # isolate the number from the file name
  gsub(/[^0-9]/, "", n) 
}
NR % 6 == 3 && int($0) == $0 { # after that: For every sixth record, if it
                               # is an integer,
  $0 = n                       # replace it with the isolated number.
                               # it is NR % 6 == 3 instead of == 2 because
                               # the file begins with whitespaces that our
                               # RS matches, so the first record is an empty
                               # one and the first row in the first column
                               # is the second record.
}
{ printf $0 RT }               # after that: print everything separated by the
                               # remembered record terminators.

使用gawk（对于

RT

），尽可能保持格式完整：

$ gawk -v RS='\\s+' 'NR == 1 { n = FILENAME; gsub(/[^0-9]/, "", n) } NR % 6 == 3 && int($0) == $0 { $0 = n } { printf $0 RT }' part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02

使用

RS

作为

\s+

，每个字段都是一条记录，记录后面的空白将被记为

RT

，稍后我们将其用于打印。代码是

NR == 1 {                      # First record of the file:
  n = FILENAME                 # isolate the number from the file name
  gsub(/[^0-9]/, "", n) 
}
NR % 6 == 3 && int($0) == $0 { # after that: For every sixth record, if it
                               # is an integer,
  $0 = n                       # replace it with the isolated number.
                               # it is NR % 6 == 3 instead of == 2 because
                               # the file begins with whitespaces that our
                               # RS matches, so the first record is an empty
                               # one and the first row in the first column
                               # is the second record.
}
{ printf $0 RT }               # after that: print everything separated by the
                               # remembered record terminators.

将GNU awk用于gensub（）：

使用match（）和substr（）可以在任何awk中执行相同的操作

上述方法通过将每个输入行转换为格式字符串来保留输入间距，只需将要更改的特定字段替换为

%s

。如果输入中已经包含像

%s

这样的printf格式字符串，那么它将失败，但您没有这种情况，如果您这样做了，您可能可以使用一个简单的

gsub（/%/，“%%”）

作为第一行，将每个输入行中的所有

符号转换为文字，从而解决所有问题

以下是一个适用于任何POSIX awk的版本：

$ cat tst.awk
{
    match($0,/[[:space:]]*[^[:space:]]+[[:space:]]+/)
    fmt = substr($0,1,RLENGTH) "%s" 
    match($0,/[[:space:]]*[^[:space:]]+[[:space:]]+[^[:space:]]+/)
    fmt = fmt substr($0,RLENGTH+1) "\n"
    num = FILENAME
    gsub(/[^0-9]/,"",num)
    printf fmt, ($2~/^[0-9]+$/ ? num : $2)
}
$ 
$ awk -f tst.awk part2.txt
   46742       2   48276   48343   48199   48198
   46744       2   48343   48344   48200   48199
   46746       2   48344   48332   48201   48200
   48283  3.58077402e+01 -2.97697746e+00  1.50878647e+02
   48282  3.67231688e+01 -2.97771595e+00  1.50419488e+02
   48285  3.58558188e+01 -1.98122787e+00  1.50894850e+02
   48287  3.67678239e+01 -1.98150619e+00  1.50432492e+02