Linux 在awk中打印搜索模式
我想打印匹配的搜索模式,然后计算平均行数。最好是一个示例: 输入文件:Linux 在awk中打印搜索模式,linux,bash,awk,Linux,Bash,Awk,我想打印匹配的搜索模式,然后计算平均行数。最好是一个示例: 输入文件: chr17 41275978 41276294 BRCA1_ex02_01 278 chr17 41275978 41276294 BRCA1_ex02_01 279 chr17 41275978 41276294 BRCA1_ex02_01 280 chr17 41275978 41276294 BRCA1_ex02_02 281 ch
chr17 41275978 41276294 BRCA1_ex02_01 278
chr17 41275978 41276294 BRCA1_ex02_01 279
chr17 41275978 41276294 BRCA1_ex02_01 280
chr17 41275978 41276294 BRCA1_ex02_02 281
chr17 41275978 41276294 BRCA1_ex02_02 282
chr17 41275978 41276294 BRCA1_ex02_03 283
chr17 41275978 41276294 BRCA1_ex02_03 284
chr17 41275978 41276294 BRCA1_ex02_03 285
chr17 41275978 41276294 BRCA1_ex02_04 286
chr17 41275978 41276294 BRCA1_ex02_04 287
chr17 41275978 41276294 BRCA1_ex02_04 288
我在bash循环中提取wana(例如)与第4列相同:
产出1:
chr17 41275978 41276294 BRCA1_ex02_01 278
chr17 41275978 41276294 BRCA1_ex02_01 279
chr17 41275978 41276294 BRCA1_ex02_01 280
输出2:
chr17 41275978 41276294 BRCA1_ex02_02 281
chr17 41275978 41276294 BRCA1_ex02_02 282
输出3:
chr17 41275978 41276294 BRCA1_ex02_03 283
chr17 41275978 41276294 BRCA1_ex02_03 284
chr17 41275978 41276294 BRCA1_ex02_03 285
等等。。然后计算第5列的平均值非常简单:
_file.txt中的awk'END{sum+=$5}{print NR/sum}'
在我的例子中,有数千行BRCA1_exXX_XX-所以有什么想法可以拆分它吗
Paul.假设条目按照给定数据中的第4列进行排序,您可以这样做:
awk '
$4 != prev { # if this line's 4th column is different from the previous line
if (cnt > 0) # if count of lines is greater than 0
print prev, sum / cnt # print the average
prev = $4 # save previous 4th column
sum = $5 # initialize sum to column 5
cnt = 1 # initialize count to 1
next # go to next line
}
{
sum += $5 # accumulate total of 5th column
++cnt # increment count of lines
}
END {
if (cnt > 0) # if count > 0 (avoid divide by 0 on empty file)
print prev, sum / cnt # print the average for the last line
}
' file
我想这会满足你的要求
awk '{
# Keep running sum of fifth column based on value of fourth column.
v[$4]+=$5;
# Keep count of lines with similar fourth column values.
n[$4]++
}
END {
# Loop over all the values we saw and print out their fourth columns and the sum of the fifth columns.
for (val in n) {
print val ": " v[val] / n[val]
}
}' $file
这假设条目总是有序的。Wau看起来有效:-)谢谢!有可能解释吗?我可以将标准偏差值添加到第三列吗?@EtanReisner是的,它假设条目按第四列排序,与给定的数据一样。只需在结尾部分添加一个对
n
的测试,以避免在空文件上出现被零除的错误。切勿将字母l
用作变量名,因为它看起来太像数字1
。在某些字体中完全无法区分。@埃德蒙顿说得很对。我用它来代替“行”,但在这种情况下也没有多大意义。编辑。是的,那太好了-它工作得非常好。谢谢你的解释!这将以随机顺序输出数据。假设您希望以与输入相同的顺序输出数据,只需将n[$4]+
移出当前操作部分,并添加一个新的条件+操作!n[$4]+{keys[++numKeys]=$4}
然后在结尾部分对(k=1;k)进行循环