Awk 计算列中特定图案的数量

Awk 计算列中特定图案的数量,awk,Awk,鉴于下面的文件,我想计算每列中不相同的每种模式的出现频率,即: A/A C/G C/G A/T C/C G/G A/A C/G C/C A/T C/G C/G T/T C/G C/G 输出: A/T = 2/5 C/G = 4/5 C/G = 3/5 First column: ------------- A/T = 2/5 T/T = 1/5 A/A = 2/5 Second column: ------------- C/C = 1/5 C/G = 4/5 Third column:

鉴于下面的文件,我想计算每列中不相同的每种模式的出现频率,即:

A/A C/G C/G
A/T C/C G/G
A/A C/G C/C
A/T C/G C/G
T/T C/G C/G
输出:

A/T = 2/5
C/G = 4/5
C/G = 3/5
First column:
-------------
A/T = 2/5
T/T = 1/5
A/A = 2/5

Second column:
-------------
C/C = 1/5
C/G = 4/5

Third column:
-------------
G/G = 1/5
C/C = 1/5
C/G = 3/5
我在AWK中尝试了一些代码,但似乎不起作用。谢谢你的帮助,谢谢

编辑:

我重新创建了我的文件,如下所示:

A A C G C G
A T C C G G
A A C G C C
A T C G C G
T T C G C G

awk '$1 != $2 {n++}; END {print n}' file
这将为前两列提供所需的出现次数。现在我想在列上循环,检查每两列是否相等,即1与2,3与4,等等


如何实现仅在奇数列上循环?

这可能会有所帮助。然而,也许有更好的方法可以做到这一点:

text = """A/A C/G C/G
A/T C/C G/G
A/A C/G C/C
A/T C/G C/G
T/T C/G C/G"""

first_column = list()
second_column = list()
third_column = list()

for row in text.strip().split('\n'):
    columns = row.split()
    first_column.append(columns[0])
    second_column.append(columns[1])
    third_column.append(columns[2])

first_column_ocurrences = dict((i, "{}/{}".format(first_column.count(i), len(first_column))) for i in first_column)
second_column_ocurrences = dict((i, "{}/{}".format(second_column.count(i), len(second_column))) for i in second_column)
third_column_ocurrences = dict((i, "{}/{}".format(third_column.count(i), len(third_column))) for i in third_column)

print "First column:"
print "-------------"
for k,v in first_column_ocurrences.items():
    print "{} = {}".format(k,v)

print "\nSecond column:"
print "-------------"

for k,v in second_column_ocurrences.items():
    print "{} = {}".format(k,v)

print "\nThird column:"
print "-------------"

for k,v in third_column_ocurrences.items():
    print "{} = {}".format(k,v)
输出:

A/T = 2/5
C/G = 4/5
C/G = 3/5
First column:
-------------
A/T = 2/5
T/T = 1/5
A/A = 2/5

Second column:
-------------
C/C = 1/5
C/G = 4/5

Third column:
-------------
G/G = 1/5
C/C = 1/5
C/G = 3/5

我会这样做:

from collections import Counter

with open('file.txt', 'r') as raw_data:
    data = [line.strip().split() for line in raw_data.readlines()]
a = [record[0] for record in data]
b = [record[1] for record in data]
c = [record[2] for record in data]

print Counter(a)
print Counter(b)
print Counter(c)
它以字典的形式打印数据,但从现在起你可以处理它,对吗

快来营救

适用于任意偶数列

awk '{for(i=1;i<=NF;i+=2) 
         if($i!=$(i+1)) 
             a["column "i": "$i"/"$(i+1)]++} 
  END{for(k in a) print k,a[k]"/"NR}' file

column 1: A/T 2/5
column 3: C/G 4/5
column 5: C/G 3/5

awk'{for(i=1;i您根本不需要将行存储在内存中,也可以使用csv库进行解析:

from collections import Counter
import csv
with open('file.txt', 'r') as raw_data:
    cn_a, cn_b, cn_c = Counter(),Counter(), Counter()
    for a ,b, c in csv.reader(raw_data,delimiter=" "):
        cn_a[a] += 1
        cn_b[b] += 1
        cn_c[c] += 1

看起来您希望我们为您编写一些代码。虽然许多用户愿意为陷入困境的编码人员编写代码,但他们通常只在海报已经尝试自己解决问题时提供帮助。演示这一努力的一个好方法是包含您迄今为止编写的代码,例如输入(如果有),预期输出,以及您实际获得的输出(控制台输出、回溯等)。您提供的详细信息越多,您可能收到的答案就越多。检查and。那么代码在哪里,以及“似乎不起作用”的确切含义是什么?由于您的编辑,您现在有两个独立的问题。我们应该回答哪一个?