python字典删除重复的键值对

python字典删除重复的键值对,python,python-2.7,python-3.x,Python,Python 2.7,Python 3.x,我有一个文件,需要从中删除重复的对(用粗体标记) 输入文件: AT1G01010 = 0005634 **AT1G01010 = 0006355** AT1G01010 = 0003677 AT1G01010 = 0007275 **AT1G01010 = 0006355 AT1G01010 = 0006355** AT1G01010 = 0006888 **AT1G01020 = 0016125** AT1G01020 = 0016020 **AT1G01020 = 0005739** **A

我有一个文件,需要从中删除重复的对(用粗体标记)

输入文件:

AT1G01010 = 0005634
**AT1G01010 = 0006355**
AT1G01010 = 0003677
AT1G01010 = 0007275
**AT1G01010 = 0006355
AT1G01010 = 0006355**
AT1G01010 = 0006888
**AT1G01020 = 0016125**
AT1G01020 = 0016020
**AT1G01020 = 0005739**
**AT1G01020 = 0016125**
AT1G01020 = 0003674
AT1G01020 = 0005783
**AT1G01020 = 0005739**
**AT1G01020 = 0006665
AT1G01020 = 0006665**
预期产出:

AT1G01010 = 0005634
AT1G01010 = 0006355
AT1G01010 = 0003677
AT1G01010 = 0007275
AT1G01010 = 0006888
AT1G01020 = 0016125
AT1G01020 = 0016020
AT1G01020 = 0005739
AT1G01020 = 0003674
AT1G01020 = 0005783
AT1G01020 = 0006665
所以为了消除重复,我首先制作了一本字典。创建字典后,我尝试了以下编码:

import sys

ara_go_file = open (sys.argv[1]).readlines()

ara_id_list = []
ara_go_list  = []


for lines in ara_go_file:
    split_lines = lines.split(' ')
    ara_id      = split_lines[0]
    ara_id_list.append(ara_id)

    go_id_split = split_lines[-1]
    go_id       = go_id_split.split('\n')[0]
    ara_go_list.append(go_id)

ara_id_go_dic = dict (zip(ara_id_list, ara_go_list))  ##ara_id_go_dic  (this is the name of the dict I have created)

new_dict = {}  # made a new dict to copy the data into this n remove the duplicate pairs

for k in ara_id_go_dic.items():
    if k[0] in new_dict:
        if k[1] not in new_dict[k[0]]:
            new_dict[k[0]].append(k[1])
        else:
            new_dict[k[0]]=[k[1]]

print new_dict
我不知道我到底在哪里犯了错误


请让我知道我的错误,否则如果有其他方法删除重复对

您可以使用
set
删除重复的元素:

>>> s="""AT1G01010 = 0006355
... AT1G01010 = 0003677
... AT1G01010 = 0007275
... AT1G01010 = 0006355
... AT1G01010 = 0006355
... AT1G01010 = 0006888
... AT1G01020 = 0016125
... AT1G01020 = 0016020
... AT1G01020 = 0005739
... AT1G01020 = 0016125
... AT1G01020 = 0003674
... AT1G01020 = 0005783
... AT1G01020 = 0005739
... AT1G01020 = 0006665
... AT1G01020 = 0006665"""
>>> for j in set([i for i in s.split('\n')]):
...     print j
... 
AT1G01010 = 0005634
AT1G01020 = 0016020
AT1G01010 = 0007275
AT1G01010 = 0006355
AT1G01020 = 0006665
AT1G01010 = 0003677
AT1G01020 = 0005783
AT1G01020 = 0016125
AT1G01020 = 0005739
AT1G01020 = 0003674
AT1G01010 = 0006888

使用CSV模块并设置:

  • 读卡器通过csv模块输入文件并创建元组集。设置为不保存重复值
  • 在新文件中写入输出
  • 输入:

    import csv
    p = "dp-input.txt"
    result = set()
    with open(p , "rb") as fp:
        root = csv.reader(fp, delimiter='=')
        for row  in root:
            result.add((row[0], row[1]))
    
    p1 = "dp-output.txt"
    with open(p1 , "wb") as fp:
        root = csv.writer(fp, delimiter='=')
        root.writerows(result)
    
    AT1G01010 = 0006888
    AT1G01020 = 0016020
    AT1G01020 = 0005739
    AT1G01010 = 0007275
    AT1G01020 = 0003674
    AT1G01020 = 0016125
    AT1G01020 = 0005783
    AT1G01020 = 0006665
    AT1G01010 = 0003677
    AT1G01010 = 0005634
    AT1G01010 = 0006355
    
    同样的问题

    演示:

    import csv
    p = "dp-input.txt"
    result = set()
    with open(p , "rb") as fp:
        root = csv.reader(fp, delimiter='=')
        for row  in root:
            result.add((row[0], row[1]))
    
    p1 = "dp-output.txt"
    with open(p1 , "wb") as fp:
        root = csv.writer(fp, delimiter='=')
        root.writerows(result)
    
    AT1G01010 = 0006888
    AT1G01020 = 0016020
    AT1G01020 = 0005739
    AT1G01010 = 0007275
    AT1G01020 = 0003674
    AT1G01020 = 0016125
    AT1G01020 = 0005783
    AT1G01020 = 0006665
    AT1G01010 = 0003677
    AT1G01010 = 0005634
    AT1G01010 = 0006355
    
    输出:

    import csv
    p = "dp-input.txt"
    result = set()
    with open(p , "rb") as fp:
        root = csv.reader(fp, delimiter='=')
        for row  in root:
            result.add((row[0], row[1]))
    
    p1 = "dp-output.txt"
    with open(p1 , "wb") as fp:
        root = csv.writer(fp, delimiter='=')
        root.writerows(result)
    
    AT1G01010 = 0006888
    AT1G01020 = 0016020
    AT1G01020 = 0005739
    AT1G01010 = 0007275
    AT1G01020 = 0003674
    AT1G01020 = 0016125
    AT1G01020 = 0005783
    AT1G01020 = 0006665
    AT1G01010 = 0003677
    AT1G01010 = 0005634
    AT1G01010 = 0006355
    

    你得到的输出是什么?我得到的是一张空白的字典。
    ara\u id\u go\u dic
    字典是如何创建的?你能打印这本字典吗?好的,我将添加如何创建dict的代码。重复检查是Python set()的目的。试一试。事实上,字符串已经是不可变的,只有在存在不同格式风险的情况下,您才会转换为一对;否则
    set(ara\u go\u文件)
    就足够了。@FrancisColas
    split
    的结果是一个
    列表
    并且列表是可变的,但是是的,这是一项冗余工作;)您需要
    将输入拆分为
    列表
    ,因为您将其放在一个字符串中,但在OP的情况下,它是一个文件,因此需要一个自我施加的约束。