Python 比较两个列表并找出相似之处
我有一张这样的清单:Python 比较两个列表并找出相似之处,python,perl,awk,Python,Perl,Awk,我有一张这样的清单: C E 我想在下表(表1)中找到这些,并将它们写入第二个表(表2) 有人有python或perl脚本来做这件事吗 表1: A MU_ADO_2 1099 MU_ADO_2.1099 o o o o o o o o o o 7.82436 s_3_merged Suseptible A AG 2 4 0 2 0
C
E
我想在下表(表1)中找到这些,并将它们写入第二个表(表2)
有人有python或perl脚本来做这件事吗
表1:
A MU_ADO_2 1099 MU_ADO_2.1099 o o o o o o o o o o 7.82436 s_3_merged Suseptible A AG 2 4 0 2 0
A MU_ADO_2 1105 MU_ADO_2.1105 327.008 s_2_merged Resistance G GT 81 0 2 132 79 31.5281 s_6_merged Resistance G GT 8 0 1 8 7 34.9813 s_3_merged Suseptible G GT 7 0 0 3 7 7.82436 s_7_merged Suseptible G GT 2 0 0 4 2
A MU_ADO_2 1110 MU_ADO_2.1110 515.963 s_2_merged Resistance A AT 113 96 1 2 110 31.5281 s_6_merged Resistance A AT 7 8 0 0 7 16.3388 s_3_merged Suseptible A AT 4 7 0 0 4 13.808 s_7_merged Suseptible A AT 3 3 0 0 3
A MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
B MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
D MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
F MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
表2:
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
C MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
E MU_ADO_2 1120 MU_ADO_2.1120 1.00E+06 s_2_merged Resistance A AG 169 107 1 167 1 36.1308 s_6_merged Resistance A AG 8 9 0 8 0 35.6751 s_7_merged Suseptible A AG 7 2 0 7 0 20.9415 s_3_merged Suseptible A AG 5 8 0 5 0
如果您的问题是:“If如何筛选此文件以仅查看第一个字段等于C
或E
的条目?”
那么,以下几点应该起作用:
awk '$1 ~ /[CE]/ { print $0 }' yourfile > outfile
如果要以牺牲清晰度为代价保存一些击键,以下操作也可以:
awk '$1 ~ /[CE]/' yourfile > outfile
由于您包含了标签,我假设您对其他*nix实用程序开放,这里有一个sed
解决方案:
sed '/^[^CE]/d' table1.txt > table2.txt
这将删除table1.txt中所有不以C或E开头的行。假设“C E”列表来自一个文件:
awk '
FILENAME == ARGV[1] {list[$1]; next}
$1 in list {print}
' list.txt table1 > table2
或者,在python中:
keys = ['C', 'E']
with open('out.txt', 'a') as out:
with open('test.txt') as f:
for line in f:
for key in keys:
if line.startswith(key):
out.write(line)
break
test.txt
是带有表1的文件,复制粘贴。out.txt
是一个文件,您可以从中获取表2grep如何
您还可以将其重定向到新文件中:
grep -e '^[CE]' source.file > dest.file
到目前为止你试过什么?你说的“CE”是什么意思?您想找到什么?现在您的表已经编辑好了(谢谢F.J),我唯一的问题是您到目前为止尝试了什么?在Perl中,它需要另外三个字符。那又怎么样?你也有无限好的正则表达式-和一个真正的正则表达式™ 编程语言。还要注意的是,您的代码并不像您所说的那样。哎呀@tchrist放松点,Perl比awk好,我不是想发动一场圣战,我会删除让你不安的评论。但就我所知,这是可行的,让我知道你发现了什么错误。评论只是挑衅性的。但是,您的代码会检测第一个字段是否包含C或E,这与您的“第一个字段等于
C
或E
”所说的“C”| |$1==“E””完全不同。我不是在判断正确性,只是指出代码描述与代码所做的不匹配。一种Perl解决方案是Perl-ne'/^[CE]/&&print'
,尽管我更喜欢print if/^[CE]/
更具可读性。在循环中write
之后需要中断,以使其更有效,或者在Python 2.7-上使用open('out.txt','a')as out,'test open('test.txt')相当于两行作为f:
然后out.writelines(f中的行对应于f中的行,如果有的话(line.startswith(key)对应于键入键))
@agf,我包含了一个中断。对于其他人,我更喜欢让OP的代码尽可能简单,因为他似乎是SO中的新手。是的,我并不是真的推荐golfed版本,因为打破循环+如果是好的。很好!从awk
到sed
再到grep
的过程不断导致更简单的答案。
grep -e '^[CE]' source.file > dest.file