Python 比较两个列表并找出相似之处

Python 比较两个列表并找出相似之处,python,perl,awk,Python,Perl,Awk,我有一张这样的清单: C E 我想在下表(表1)中找到这些,并将它们写入第二个表(表2) 有人有python或perl脚本来做这件事吗 表1: A MU_ADO_2 1099 MU_ADO_2.1099 o o o o o o o o o o 7.82436 s_3_merged Suseptible A AG 2 4 0 2 0

我有一张这样的清单:

C
E
我想在下表(表1)中找到这些,并将它们写入第二个表(表2)

有人有python或perl脚本来做这件事吗

表1:

A   MU_ADO_2    1099    MU_ADO_2.1099   o   o   o   o   o   o   o   o   o   o   7.82436 s_3_merged  Suseptible  A   AG  2   4   0   2   0                                                                               
A   MU_ADO_2    1105    MU_ADO_2.1105   327.008 s_2_merged  Resistance  G   GT  81  0   2   132 79  31.5281 s_6_merged  Resistance  G   GT  8   0   1   8   7   34.9813 s_3_merged  Suseptible  G   GT  7   0   0   3   7   7.82436 s_7_merged  Suseptible  G   GT  2   0   0   4   2
A   MU_ADO_2    1110    MU_ADO_2.1110   515.963 s_2_merged  Resistance  A   AT  113 96  1   2   110 31.5281 s_6_merged  Resistance  A   AT  7   8   0   0   7   16.3388 s_3_merged  Suseptible  A   AT  4   7   0   0   4   13.808  s_7_merged  Suseptible  A   AT  3   3   0   0   3
A   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
B   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
D   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
F   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
表2:

C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
C   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
E   MU_ADO_2    1120    MU_ADO_2.1120   1.00E+06    s_2_merged  Resistance  A   AG  169 107 1   167 1   36.1308 s_6_merged  Resistance  A   AG  8   9   0   8   0   35.6751 s_7_merged  Suseptible  A   AG  7   2   0   7   0   20.9415 s_3_merged  Suseptible  A   AG  5   8   0   5   0
如果您的问题是:“If如何筛选此文件以仅查看第一个字段等于
C
E
的条目?”

那么,以下几点应该起作用:

awk '$1 ~ /[CE]/ { print $0 }' yourfile > outfile
如果要以牺牲清晰度为代价保存一些击键,以下操作也可以:

awk '$1 ~ /[CE]/' yourfile > outfile
由于您包含了标签,我假设您对其他*nix实用程序开放,这里有一个
sed
解决方案:

sed '/^[^CE]/d' table1.txt > table2.txt
这将删除table1.txt中所有不以C或E开头的行。

假设“C E”列表来自一个文件:

awk '
    FILENAME == ARGV[1] {list[$1]; next}
    $1 in list {print}
' list.txt table1 > table2

或者,在python中:

keys = ['C', 'E']
with open('out.txt', 'a') as out:
    with open('test.txt') as f:
        for line in f:
            for key in keys:
                if line.startswith(key):
                    out.write(line)
                    break
test.txt
是带有表1的文件,复制粘贴。
out.txt
是一个文件,您可以从中获取表2

grep如何

您还可以将其重定向到新文件中:

grep -e '^[CE]' source.file > dest.file

到目前为止你试过什么?你说的“CE”是什么意思?您想找到什么?现在您的表已经编辑好了(谢谢F.J),我唯一的问题是您到目前为止尝试了什么?在Perl中,它需要另外三个字符。那又怎么样?你也有无限好的正则表达式-和一个真正的正则表达式™ 编程语言。还要注意的是,您的代码并不像您所说的那样。哎呀@tchrist放松点,Perl比awk好,我不是想发动一场圣战,我会删除让你不安的评论。但就我所知,这是可行的,让我知道你发现了什么错误。评论只是挑衅性的。但是,您的代码会检测第一个字段是否包含C或E,这与您的“第一个字段等于
C
E
”所说的“C”| |$1==“E””完全不同。我不是在判断正确性,只是指出代码描述与代码所做的不匹配。一种Perl解决方案是
Perl-ne'/^[CE]/&&print'
,尽管我更喜欢
print if/^[CE]/
更具可读性。在循环中
write
之后需要
中断,以使其更有效,或者在Python 2.7-
上使用open('out.txt','a')as out,'test open('test.txt')相当于两行作为f:
然后
out.writelines(f中的行对应于f中的行,如果有的话(line.startswith(key)对应于键入键))
@agf,我包含了一个中断。对于其他人,我更喜欢让OP的代码尽可能简单,因为他似乎是SO中的新手。是的,我并不是真的推荐golfed版本,因为打破循环+如果是好的。很好!从
awk
sed
再到
grep
的过程不断导致更简单的答案。
grep -e '^[CE]' source.file > dest.file