Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/.htaccess/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python:基于一个相同的特定键组合所有dict键值_Python_Dictionary_Merge - Fatal编程技术网

Python:基于一个相同的特定键组合所有dict键值

Python:基于一个相同的特定键组合所有dict键值,python,dictionary,merge,Python,Dictionary,Merge,我知道有一百万个这样的问题,我就是找不到一个适合我的答案 我有这个: list1 = [{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H']}, {'assembly_id': '1', 'asym_id_list': ['C', 'D', 'F', 'I', 'J']}, {'assembly_id':2,'asym_id_list':['D,C'],'auth_id_list':['C','V']}] 如果程序集I

我知道有一百万个这样的问题,我就是找不到一个适合我的答案

我有这个:

list1 =   [{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H']}, {'assembly_id': '1', 'asym_id_list': ['C', 'D', 'F', 'I', 'J']}, {'assembly_id':2,'asym_id_list':['D,C'],'auth_id_list':['C','V']}]
如果程序集ID相同,我希望在dict中组合其他相同的键

在本例中,assembly_id 1出现两次,因此上面的输入将变成:

[{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H','C', 'D', 'F', 'I', 'J']},{'assembly_id':2,'asym_id_list:['D,C'],'auth_id_list':['C','V']}]
理论上,可以有n个程序集id(即程序集1可以在dict中出现10或20次,而不仅仅是2次),并且最多可以有两个其他列表组合(asym_id_列表和auth_id_列表)

我在研究这个方法:

new_dict = {}
assembly_list = [] #to keep track of assemblies already seen
for dict_name in list1: #for each dict in the list
        if dict_name['assembly_id'] not in assembly_list: #if the assembly id is new
                new_dict['assembly_id'] = dict_name #this line is wrong, add the entry to new_dict
                assembly_list.append(new_dict['assembly_id']) #append the id to 'assembly_list'
        else:
                new_dict['assembly_id'].append(dict_name) #else if it's already seen, append the dictionaries together, this is wrong
print(new_dict)
输出错误:

{'assembly_id': {'assembly_id': 2, 'asym_id_list': ['D,C'], 'auth_id_list': ['C', 'V']}}

但我认为这个想法是正确的,我应该打开一个新的列表并记录,如果以前没有见过,则添加;如果以前见过…联合收割机?但这只是我没有得到的细节?

使用键入了
assembly\u id
的dict来收集给定密钥的所有数据;然后,如果需要,您可以返回并生成原始格式的dict列表

>>> from collections import defaultdict
>>> from typing import Dict, List
>>> id_lists: Dict[str, List[str]] = defaultdict(list)
>>> for d in list1:
...     id_lists[d['assembly_id']].extend(d['asym_id_list'])
...
>>> combined_list = [{
...     'assembly_id': id, 'asym_id_list': id_list
... } for id, id_list in id_lists.items()]
>>> combined_list
[{'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']}, {'assembly_id': 2, 'asym_id_list': ['D,C']}]
>>>

(编辑)没有看到关于
auth_id_list
的部分,因为它隐藏在原始代码的滚动中——同样的策略也适用,只需在第一步中使用两个dict,或者将其作为一些列表集合的dict(例如,列表的dict,外部dict键入
assembly\u id
值,内部dict键入原始字段名).

如果逻辑思维正确,我们可以使用字典
m
,其中包含
assembly\u id
的键、值对及其相应的字典来跟踪访问的
assembly\u id
,每当遇到新的
assembly\u id
时,我们都会将其添加到字典
m
,否则,如果它已被访问ady包含
assembly\u id
我们只是扩展了
asym\u id\u列表
auth\u id\u列表
,用于该
assembly\u id

def merge(dicts):
    m = {} # keeps track of the visited assembly_ids
    for d in dicts:
        key = d['assembly_id'] # assembly_id is used as merge/grouping key
        if key in m:
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = m[key].get('asym_id_list', []) + d['asym_id_list']
            elif 'auth_id_list' in d:
                m[key]['auth_id_list'] = m[key].get('auth_id_list', []) + d['auth_id_list']
        else:
            m[key] = d
    return list(m.values())
结果:

# merge(list1)
[
    {
        'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']
    },
    {
        'assembly_id': 2, 'asym_id_list': ['D,C'], 'auth_id_list': ['C', 'V']
    }
]

@Samwise为您提出的问题提供了一个很好的答案,这并不是为了取代这个问题。但是,我将建议您在合并后保留数据的方式。我会将此放在注释中,但没有办法在注释中保留代码格式,而且它也有点太大

在此之前,我认为您的示例数据中有一个输入错误。我认为您是指
'assembly\u id':2,'asym\u id\u list':['D,C']
中的
'D,C'
是这样的单独字符串:
'assembly\u id':2,'asym\u id\u list':['D',C']
。我将在下面假设,但如果不是,则不会更改任何代码或注释

合并的结构不是一个字典列表,而是这样的:

merge_l = [
            {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          ]
merge_d = { '1': {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            '2': {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          }
def merge(dicts):
    m = {} # keeps track of the visited assembly_ids
    for d in dicts:
        key = d['assembly_id'] # assembly_id is used as merge/grouping key
        if key in m:
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = m[key].get('asym_id_list', set()) | set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = m[key].get('auth_id_list', set()) | set(d['auth_id_list'])
        else:
            m[key] = {'assembly_id': d['assembly_id']}
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = set(d['auth_id_list'])
    return m
相反,我建议不要使用列表作为顶级结构,而是使用由assembly_id的值键入的字典。因此,它将是一个值为字典的字典。如下所示:

merge_d = { '1': {'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            '2': {'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          }
或者,如果您还想保留“assembly_id”,如下所示:

merge_l = [
            {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          ]
merge_d = { '1': {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            '2': {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          }
def merge(dicts):
    m = {} # keeps track of the visited assembly_ids
    for d in dicts:
        key = d['assembly_id'] # assembly_id is used as merge/grouping key
        if key in m:
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = m[key].get('asym_id_list', set()) | set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = m[key].get('auth_id_list', set()) | set(d['auth_id_list'])
        else:
            m[key] = {'assembly_id': d['assembly_id']}
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = set(d['auth_id_list'])
    return m
最后一个可以通过改变@Samwise的
merge()
方法的返回值来实现,只需
returnm
而不是将dict转换为列表

关于@Samwise code的另一个评论是,组合列表可以包含重复项。因此,如果原始数据在一个条目中有
asym\u id\u列表]:['A',B']
,在另一个条目中有
asym\u id\u列表]:['B',C']
,组合列表将包含
asym id\u列表]:['A','B','B','C']
。这可能是您想要的,但是如果您想避免,您可以使用集合而不是列表作为asym\u id和auth\u id容器的内部容器

在@Samwise answer中,将其更改如下:

merge_l = [
            {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          ]
merge_d = { '1': {'assembly_id': '1', 'asym_id_list': ['A', 'B', 'E', 'G', 'H', 'C', 'D', 'F', 'I', 'J']},
            '2': {'assembly_id': 2, 'asym_id_list': ['D', 'C'], 'auth_id_list': ['C', 'V']}
          }
def merge(dicts):
    m = {} # keeps track of the visited assembly_ids
    for d in dicts:
        key = d['assembly_id'] # assembly_id is used as merge/grouping key
        if key in m:
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = m[key].get('asym_id_list', set()) | set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = m[key].get('auth_id_list', set()) | set(d['auth_id_list'])
        else:
            m[key] = {'assembly_id': d['assembly_id']}
            if 'asym_id_list' in d:
                m[key]['asym_id_list'] = set(d['asym_id_list'])
            if 'auth_id_list' in d:
                m[key]['auth_id_list'] = set(d['auth_id_list'])
    return m

如果这样做,您可能需要重新考虑键名
'asym\u id\u list'
'auth\u id\u list'
,因为它们是集合而不是列表。但这可能会受到其他代码的约束,以及它的预期。重新考虑最外层的循环:您想要返回一个字典列表。然后您可以附加
dic如果您以前没有看到过该列表,那么您可以将该列表添加到该列表中,如果您以前看到过该列表,则只需添加列表
'asym\u id\u list'
'auth\u id\u list'
。我认为您的示例数据中有两个输入错误。我认为您的意思是
'assembly\u id\u list':2,'asym\u id\u list':['D,C']
要像这样成为单独的字符串:
'assembly\u id':2,'asym\u id\u list':['D','C']
。在'assembly\u id'键中,您还可以混合使用字符串和整数(即
'1'
2
)。虽然这会起作用,但我猜您不希望键是
int
s和
string
s的混合键,您在这段代码中有一个小错误。
elif'auth\u id\u list'在d:
中不应该是
elif
,如果d:
中的'auth\u id\u list'应该是
,因为条目可以同时包含这两个
asym\u id\u list'
'auth\u id\u list'
条目。