Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 在字典中对相似值进行分组_Python_Pandas_Dictionary_Group By - Fatal编程技术网

Python 在字典中对相似值进行分组

Python 在字典中对相似值进行分组,python,pandas,dictionary,group-by,Python,Pandas,Dictionary,Group By,我是编程新手,如果有人能在Python/Pandas中提供以下帮助,我将不胜感激。 我有一本字典,上面有一个值列表。我希望能够将具有相似值的键组合在一起。我在这里看到过类似的问题,但在这个例子中,我想忽略值的顺序,例如: classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']

我是编程新手,如果有人能在Python/Pandas中提供以下帮助,我将不胜感激。 我有一本字典,上面有一个值列表。我希望能够将具有相似值的键组合在一起。我在这里看到过类似的问题,但在这个例子中,我想忽略值的顺序,例如:

classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
杰克和查尔斯的价值观相同,但顺序不同。我想要一个无论顺序如何都能给出值的输出。在这种情况下,输出将作为

['20','male','soccer']: jack, charles
['26','male','tennis']: brian
['19','basketball','male']: zulu

您可以使用以下代码按所需方式反转字典:

classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}

out_dict = {}
for key, value in classmates.items():
    current_list = out_dict.get(tuple(sorted(value)), [])
    current_list.append(key)
    out_dict[tuple(sorted(value))] = current_list

print(out_dict)
这张照片

{('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}

您可以使用以下代码按所需方式反转字典:

classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}

out_dict = {}
for key, value in classmates.items():
    current_list = out_dict.get(tuple(sorted(value)), [])
    current_list.append(key)
    out_dict[tuple(sorted(value))] = current_list

print(out_dict)
这张照片

{('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}

使用
frozensets
apply
groupby
+
agg

s = pd.DataFrame(classmates).T.apply(frozenset, 1)

s2 = pd.Series(s.index.values, index=s)\
          .groupby(level=0).agg(lambda x: list(x))

s2
(soccer, 20, male)        [charles, jack]
(26, male, tennis)                [brian]
(basketball, male, 19)             [zulu]
dtype: object

使用
frozensets
apply
groupby
+
agg

s = pd.DataFrame(classmates).T.apply(frozenset, 1)

s2 = pd.Series(s.index.values, index=s)\
          .groupby(level=0).agg(lambda x: list(x))

s2
(soccer, 20, male)        [charles, jack]
(26, male, tennis)                [brian]
(basketball, male, 19)             [zulu]
dtype: object
从集合导入defaultdict
ans=defaultdict(列表)
同学={'jack':['20','Mean','soccer'],
“布莱恩”:[“26”,“男性”,“网球”],
“查尔斯”:[男性”,“足球”,“20'],
祖鲁语:['19','basketball','male']
}
对于同学中的k,v.items():
排序的\元组=元组(排序的(v))
ans[sorted_tuple].append(k)
#答案是:你想要的口述
#defaultdict(,{('20','mal','soccer'):['jack','charles',],
#('26'、'male'、'tennis'):['brian'],('19'、'basketball'、'male'):['zulu']})
对于ans.items()中的k,v:
打印(k,,:,v)
#输出:
#(‘20’、‘男性’、‘足球’):[‘杰克’、‘查尔斯’]
#(‘26’、‘男性’、‘网球’):[‘布莱恩’]
#('19','basketball','male'):['zulu']
从集合导入defaultdict
ans=defaultdict(列表)
同学={'jack':['20','Mean','soccer'],
“布莱恩”:[“26”,“男性”,“网球”],
“查尔斯”:[男性”,“足球”,“20'],
祖鲁语:['19','basketball','male']
}
对于同学中的k,v.items():
排序的\元组=元组(排序的(v))
ans[sorted_tuple].append(k)
#答案是:你想要的口述
#defaultdict(,{('20','mal','soccer'):['jack','charles',],
#('26'、'male'、'tennis'):['brian'],('19'、'basketball'、'male'):['zulu']})
对于ans.items()中的k,v:
打印(k,,:,v)
#输出:
#(‘20’、‘男性’、‘足球’):[‘杰克’、‘查尔斯’]
#(‘26’、‘男性’、‘网球’):[‘布莱恩’]
#('19','basketball','male'):['zulu']

首先,将词典转换为数据帧

df= pd.DataFrame.from_dict(classmates,orient='index')
然后按年龄按升序排序

df=df.sort_values(by=0,ascending=True)

这里0是默认的列名。您可以重命名此列名。

首先,将词典转换为数据帧

df= pd.DataFrame.from_dict(classmates,orient='index')
然后按年龄按升序排序

df=df.sort_values(by=0,ascending=True)

这里0是默认的列名。您可以重命名此列名。

您可以在一行中完成此操作:

print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})

以下是详细的解决方案:

dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
     'zulu': ['19', 'basketball', 'male']}

sorted_dict = {}
for key,value in dict_1.items():
    sorted_1 = sorted(value)
    sorted_dict[key] = sorted_1

tracking_of_duplicate = []
final_dict = {}
for key1,value1 in sorted_dict.items():
    if value1 not in tracking_of_duplicate:
        tracking_of_duplicate.append(value1)
        final_dict[tuple(value1)] = [key1]

    else:

        final_dict[tuple(value1)].append(key1)

print(final_dict)

您可以在一行中完成此操作:

print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})

以下是详细的解决方案:

dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
     'zulu': ['19', 'basketball', 'male']}

sorted_dict = {}
for key,value in dict_1.items():
    sorted_1 = sorted(value)
    sorted_dict[key] = sorted_1

tracking_of_duplicate = []
final_dict = {}
for key1,value1 in sorted_dict.items():
    if value1 not in tracking_of_duplicate:
        tracking_of_duplicate.append(value1)
        final_dict[tuple(value1)] = [key1]

    else:

        final_dict[tuple(value1)].append(key1)

print(final_dict)

agg
是否需要
lambda x:list(x)
?这不就是
agg(list)
?@AdamSmith是的,否则你会得到
TypeError:“type”对象不合适
谢谢——我的熊猫福充其量也很弱!
agg
是否需要
lambda x:list(x)
?这不就是
agg(list)
?@AdamSmith是的,否则你会得到
TypeError:“type”对象不合适
谢谢——我的熊猫福充其量也很弱!