如何在Python字典中重命名相同的值?
我有一个字典,我想在其中重命名类似的值,如下所示:如何在Python字典中重命名相同的值?,python,dictionary,Python,Dictionary,我有一个字典,我想在其中重命名类似的值,如下所示: { 33: [3, 4, 6], 34: [3, 4, 6], 35: [3, 4, 6], 99: [7, 8], 100: [7, 8], 124: [0, 1, 2, 5], 125: [0, 1, 2, 5], 126: [0, 1, 2, 5], 127: [0, 1, 2, 5] } 我需要去: { 33: Cluster1, 34: Clus
{
33: [3, 4, 6],
34: [3, 4, 6],
35: [3, 4, 6],
99: [7, 8],
100: [7, 8],
124: [0, 1, 2, 5],
125: [0, 1, 2, 5],
126: [0, 1, 2, 5],
127: [0, 1, 2, 5]
}
我需要去:
{
33: Cluster1,
34: Cluster1,
35: Cluster1,
99: Cluster2,
100: Cluster2,
124: Cluster3,
125: Cluster3,
126: Cluster3,
127: Cluster3
}
任何提示都将不胜感激。您可以使用defaultdict并对初始字典进行排序:
from collections import defaultdict
d = defaultdict(str)
counter = 1
s = {33: [3, 4, 6],
34: [3, 4, 6],
35: [3, 4, 6],
99: [7, 8],
100: [7, 8],
124: [0, 1, 2, 5],
125: [0, 1, 2, 5],
126: [0, 1, 2, 5],
127: [0, 1, 2, 5]}
new_s = sorted(s.items(), key=lambda x:x[0])
d1 = new_s[0][-1]
for a, b in new_s:
if b == d1:
d[a] = "Cluster{}".format(counter)
d1 = b
else:
d[a] = "Cluster{}".format(counter+1)
d1 = b
counter += 1
for a, b in sorted(d.items(), key=lambda x: x[0]):
print(a, b)
输出:
(33, 'Cluster1')
(34, 'Cluster1')
(35, 'Cluster1')
(99, 'Cluster2')
(100, 'Cluster2')
(124, 'Cluster3')
(125, 'Cluster3')
(126, 'Cluster3')
(127, 'Cluster3')
(9, 'Cluster6')
(10, 'Cluster6')
(11, 'Cluster6')
(13, 'Cluster2')
(18, 'Cluster4')
(21, 'Cluster5')
(22, 'Cluster5')
(23, 'Cluster5')
(25, 'Cluster4')
(33, 'Cluster1')
(34, 'Cluster1')
(35, 'Cluster1')
(36, 'Cluster3')
(37, 'Cluster3')
(40, 'Cluster2')
{33: 'Cluster1',
34: 'Cluster1',
35: 'Cluster1',
99: 'Cluster3',
100: 'Cluster3',
124: 'Cluster2',
125: 'Cluster2',
126: 'Cluster2',
127: 'Cluster2'}
编辑:
更稳健的解决方案:
s = {9: [49, 50, 51], 10: [49, 50, 51], 11: [49, 50, 51], 13: [13, 14, 15, 16, 17], 18: [28, 29], 21: [38, 39, 40], 22: [38, 39, 40], 23: [38, 39, 40], 25: [28, 29], 33: [4, 5, 6], 34: [4, 5, 6], 35: [4, 5, 6], 36: [24, 25], 37: [24, 25], 40: [13, 14, 15, 16, 17]}
import itertools
final_dict = {}
for i, a in enumerate([(a, list(b)) for a, b in itertools.groupby(sorted(s.items(), key=lambda x:x[-1]), key=lambda x:x[-1])]):
for a1, b1 in a[-1]:
final_dict[a1] = "Cluster{}".format(i+1)
for a, b in sorted(final_dict.items(), key=lambda x:x[0]):
print(a, b)
输出:
(33, 'Cluster1')
(34, 'Cluster1')
(35, 'Cluster1')
(99, 'Cluster2')
(100, 'Cluster2')
(124, 'Cluster3')
(125, 'Cluster3')
(126, 'Cluster3')
(127, 'Cluster3')
(9, 'Cluster6')
(10, 'Cluster6')
(11, 'Cluster6')
(13, 'Cluster2')
(18, 'Cluster4')
(21, 'Cluster5')
(22, 'Cluster5')
(23, 'Cluster5')
(25, 'Cluster4')
(33, 'Cluster1')
(34, 'Cluster1')
(35, 'Cluster1')
(36, 'Cluster3')
(37, 'Cluster3')
(40, 'Cluster2')
{33: 'Cluster1',
34: 'Cluster1',
35: 'Cluster1',
99: 'Cluster3',
100: 'Cluster3',
124: 'Cluster2',
125: 'Cluster2',
126: 'Cluster2',
127: 'Cluster2'}
有一点必须注意:列表不能散列, 请检查以下内容:
my_dict = {33: [3, 4, 6],
34: [3, 4, 6],
35: [3, 4, 6],
99: [7, 8],
100: [7, 8],
124: [0, 1, 2, 5],
125: [0, 1, 2, 5],
126: [0, 1, 2, 5],
127: [0, 1, 2, 5]}
# the value of key 33, 34 are different
print(id(my_dict[33]))
print(id(my_dict[34]))
def to_hash_str(my_list):
from hashlib import sha256
import json
return sha256(json.dumps(my_list).encode('utf-8')).hexdigest()
clusters_mapping = {to_hash_str(v): v for v in my_dict.values()}
print(clusters_mapping)
new_dict = {k: clusters_mapping[to_hash_str(v)] for k, v in my_dict.items()}
print(new_dict)
# the value of key 33, 34 are same
print(id(new_dict[33]))
print(id(new_dict[34]))
一种方法是:
创建一个空的dict
创建一个雇员列表来保存您看到的值
在原始字典上循环,如果您遇到一个尚未看到的项目,请将其添加到“已看到”,否则继续下一步
将当前项附加到具有原始键的新命令中,该值是所看到的项的索引
代码如下:
seen = []
dct = {}
for k in d:
if d[k] not in seen:
seen.append(d[k])
dct[k] = "Cluster{}".format(seen.index(d[k])+1)
经过测试,适合您的情况。您想要什么还不太清楚。名字是字符串吗?它们是代表前一个值的变量吗?我想他指的是变量。是的,Cluster1代表键33的值,即[3,4,6];Cluster2将替换值[7,8],依此类推。检查数据,如果在添加新的Cluster+i之前未看到该值。感谢您的提示,只有一个问题,当我打印d.items时,最后三个数字代表什么?胡安曼:我不完全确定。我不太清楚,这两个词是什么意思,分别是[33,'Cluster1',34,'Cluster1',35,'Cluster1',99,'Cluster2',100,'Cluster2',124,'Cluster3',125,'Cluster3',126,'Cluster3',127,'Cluster3',1,2,3,]@。当我在运行此代码之后打印d.items时,我会得到一个不包含1、2、3的元组的常规列表。你在我上面发布的解决方案中添加了额外的代码吗?现在没问题了,它不再出现了,很奇怪,非常感谢!阿贾克斯,刚进入了一些东西,当没有连续的时候它就不起作用了,例如{9:[49,50,51],10:[49,50,51],11:[49,50,51],13:[13,14,15,16,17],18:[28,29],21:[38,39,40],22:[38,39,40],25:[28,29],33:[4,5,6],34:[4,5,6],35:[4,5,6],36:[24,25],37:[24,25],40:[13, 14, 15, 16, 17],即使它们具有相同的值13和40,因为它们不是连续的,它们以不同的群集ID显示,您对此有什么想法吗?如果您愿意的话13 Cluster2 18 Cluster3 21 Cluster4 22 Cluster4 23 Cluster4 25 Cluster5 33 Cluster6 34 Cluster6 35 Cluster6 36 Cluster7 37 Cluster7 40 Cluster8谢谢@santhosh很好。@juanman-顺便说一句,如果你觉得这个答案有用,你可以投上一票。不幸的是,我已经投了我的票,由于我在这一点上的声誉低于15,我不能投你的一票,对此我很抱歉
d = <your dict>
set_dict = list(enumerate(set(tuple(i) for i in d.values()), 1))
{ k: 'Cluster' + str([i for i,j in set_dict if list(j) == v][0]) for k,v in d.items() }