使用Bash操作列&;Awk
让我们假设有一个example1.txt文件,它由几行组成使用Bash操作列&;Awk,awk,Awk,让我们假设有一个example1.txt文件,它由几行组成 item item item A B C 100 20 2 100 22 3 100 23 4 101 26 2 102 28 2 103 29 3 103 30 2 103 32 2 104 33 2 104 34 2 104 35 2 104 36 3 我想执行一些命令来过滤txt文件并添加更多的列 首先,我想应用一
item item item
A B C
100 20 2
100 22 3
100 23 4
101 26 2
102 28 2
103 29 3
103 30 2
103 32 2
104 33 2
104 34 2
104 35 2
104 36 3
我想执行一些命令来过滤txt文件并添加更多的列
首先,我想应用一个条件,当C项等于2时。使用awk命令,我可以通过以下方式完成
因此,返回文本文件将是:
awk '$3 == 2 { print $1 "\t" $2 "\t" $3} ' example1.txt > example2.txt
item item item
A B C
100 20 2
101 26 2
102 28 2
103 30 2
103 32 2
104 33 2
104 34 2
104 35 2
现在我想数两件事:
我想计算第1列中的唯一总数
For example, in the above case example2.txt, it would be:
(100,101,102,103,104) = 5
我想在重复列中添加一个数字,并将其添加到新列中
我想这样做:
item item item item
A B C D
100 20 2 1
101 26 2 1
102 28 2 1
103 30 2 2
103 32 2 2
104 33 2 3
104 34 2 3
104 35 2 3
~
在第D列(第4行)上方,第1行为1,因为它没有任何重复项。但在第四排,是2,因为103重复了两次。因此,我在第4列和第5列中添加了2。同样,第4项的最后三列是3列,因为第A项在这三列中重复了三次 你能试试下面的吗。如果您想将输出保存到同一个输入_文件中,请将
>temp&&mv temp Input_文件
附加到以下代码中
awk '
FNR==NR{
if($3==2){
a[$1,$3]++
}
next
}
FNR==1{
$(NF+1)="item"
print
next
}
FNR==2{
$(NF+1)="D"
print
next
}
$3!=2{
next
}
FNR>2{
$(NF+1)=a[$1,$3]
}
1
' Input_file Input_file | column -t
输出如下
item item item item
A B C D
100 20 2 1
101 26 2 1
102 28 2 1
103 30 2 2
103 32 2 2
104 33 2 3
104 34 2 3
104 35 2 3
解释:添加上述代码的详细解释
awk ' ##Starting awk program fro here.
FNR==NR{ ##Checking condition if FNR==NR which will be TRUE when 1st time Input_file is being read.
if($3==2){ ##Checking condition if 3rd field is 2 then do following.
a[$1,$3]++ ##Creating an array a whose index is $1,$3 and keep adding its index with 1 here.
}
next ##next will skip further statements from here.
}
FNR==1{ ##Checking condition if this is first line.
$(NF+1)="item" ##Adding a new field with string item in it.
print ##Printing 1st line here.
next ##next will skip further statements from here.
}
FNR==2{ ##Checking condition if this is second line.
$(NF+1)="D" ##Adding a new field with string item in it.
print ##Printing 1st line here.
next ##next will skip further statements from here.
}
$3!=2{ ##Checking condition if 3rd field is NOT equal to 2 then do following.
next ##next will skip further statements from here.
}
FNR>2{ ##Checking condition if line is greater than 2 then do following.
$(NF+1)=a[$1,$3] ##Creating new field with value of array a with index of $1,$3 here.
}
1 ##1 will print edited/non-edited lines here.
' Input_file Input_file ##Mentioning Input_file names 2 times here.
假设文件不是大文件
awk 'NR==FNR && $3 == 2{a[$1]++;next}$3==2{$4=a[$1];print;}' file.txt file.txt
您可以对该文件进行两次解析。在第一次迭代中,计算第4列并将其放入数组中。在第二次解析中,我们将计数设置为第4列,并打印整行。您可以尝试此
awk
:
awk -v OFS='\t' 'NR <= 2 {
print $0, (NR == 1 ? "item" : "D")
}
FNR == NR && $3 == 2 {
++freq[$1]
next
}
$3 == 2 {
print $0, freq[$1]
}' file{,}
与其他类似,但使用单次传递的
awk
,并将记录seen
和D
的计数存储在数组中,数组ord
和Dcnt
用于映射每个记录的信息,例如
awk '
FNR == 1 { h1=$0"\titem" } # header 1 with extra "\titem"
FNR == 2 { h2=$0"\tD" } # header 2 with exter "\tD"
FNR > 2 && $3 == 2 { # remaining rows with $3 == 2
D[$1]++ # for D colum times A seen
seen[$1,$2] = $0 # save records seen
ord[++n] = $1 SUBSEP $2 # save order all records appear
Dcnt[n] = $1 # save order mapped to $1 for D
}
END {
printf "%s\n%s\n", h1, h2 # output headers
for (i=1; i<=n; i++) # loop outputing info with D column added
print seen[ord[i]]"\t"D[Dcnt[i]]
}
' example.txt
使用
awk
对猫进行蒙皮的方法始终不止一种,如果您希望将输出保存到同一输入文件中,则在上述代码中附加>temp&&mv temp Input_文件。
awk '
FNR == 1 { h1=$0"\titem" } # header 1 with extra "\titem"
FNR == 2 { h2=$0"\tD" } # header 2 with exter "\tD"
FNR > 2 && $3 == 2 { # remaining rows with $3 == 2
D[$1]++ # for D colum times A seen
seen[$1,$2] = $0 # save records seen
ord[++n] = $1 SUBSEP $2 # save order all records appear
Dcnt[n] = $1 # save order mapped to $1 for D
}
END {
printf "%s\n%s\n", h1, h2 # output headers
for (i=1; i<=n; i++) # loop outputing info with D column added
print seen[ord[i]]"\t"D[Dcnt[i]]
}
' example.txt
item item item item
A B C D
100 20 2 1
101 26 2 1
102 28 2 1
103 30 2 2
103 32 2 2
104 33 2 3
104 34 2 3
104 35 2 3