在python中消除对象列表的重复

在python中消除对象列表的重复,python,Python,如何在python中消除对象列表的重复,例如 list\u of_objects[i]是list\u of_objects[j]返回true当且仅当i==j 例如: 我有两组数字,我构建了一个字典,数字作为键,值作为簇 a = {1,2,3} b = {4,5,6} cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b} duplicated_clusters = list(cur_dict.values()) duplicated_clusters # [{1, 2

如何在python中消除对象列表的重复,例如
list\u of_objects[i]是list\u of_objects[j]
返回
true
当且仅当
i==j

例如:

我有两组数字,我构建了一个字典,数字作为键,值作为簇

a = {1,2,3}
b = {4,5,6}
cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b}
duplicated_clusters = list(cur_dict.values())
duplicated_clusters
# [{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {4, 5, 6}, {4, 5, 6}, {4, 5, 6}]
# How to process duplicated_clusters to get [{1, 2, 3}, {4, 5, 6}]?

# Obviously set(duplicated_clusters) is not working because set is not hashable and mutable. 
由于python中没有指针,如何获得消除重复的对象列表(或者是无法实现的)?(我可以想出一些解决方法,但这对我来说并不直接,比如使用额外的标识符或将每个对象包装到包装器类中)

例2:

由于@wjandrea的评论,添加了一个更清晰的示例

a = {1,2,3}
b = {4,5,6}
c = {1,2,3}
duplicated_clusters = [a,a,b,b,c,c]
duplicated_clusters
# [{1, 2, 3}, {1, 2, 3}, {4, 5, 6}, {4, 5, 6}, {1, 2, 3}, {1, 2, 3}]
# Deduplicated clusters I want to obtain: [{1, 2, 3}, {4, 5, 6}, {1, 2, 3}], equivalent to [a,b,c]

该函数返回一个值,该值在对象的生存期内是唯一且恒定的。可以将其用作标识重复对象的键

a = {1,2,3}
b = {4,5,6}
cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)
结果:

[{1, 2, 3}, {4, 5, 6}]
“python中没有指针”基本上是正确的。在CPython中,
id
返回内存中对象的地址,因此它实际上是指向该对象的指针。但这种方法甚至适用于
id
与内存地址无关的更奇特的实现。只要
a是b
意味着
id(a)==id(b)
反之亦然,那么这种方法应该消除引用重复


。。。尽管如此,请记住Python经常“实习生”某些类型的内置值,因此您认为在引用上是唯一的对象实际上可能是同一个对象。考虑这个例子:

a = {1,2,3}
b = {1,2,3}
c = (4,5,6)
d = (4,5,6)
e = int("23")           #the parser doesn't know what value this will be until runtime
f = 23
g = int("456789101112") #the parser doesn't know what value this will be until runtime
h = 456789101112
i = 456789101111+1      #the parser knows at compile time that this evaluates to 456789101112
cur_dict = {1:a, 2:b, 3:c, 4:d, 5:e, 6:f, 7:g, 8:h, 9:i}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)
结果(在CPython中):


集合是可变的,所以它们永远不会被拘留。元组是不可变的,所以它们可以被拘留。小整数是固定的,即使您特意创建它们,使解析器在编译时无法猜测其值。大整数通常不被内定,尽管如果两个大整数值是使用算术表达式创建的,并且可以在编译时优化为一个常量,那么它们在引用上仍然是相同的。

不太可能,我要问的是关于消除对象标识的重复。例如,在我的例子中,
{1,2,3}是{1,2,3}
等于
false
a = {1,2,3}
b = {1,2,3}
c = (4,5,6)
d = (4,5,6)
e = int("23")           #the parser doesn't know what value this will be until runtime
f = 23
g = int("456789101112") #the parser doesn't know what value this will be until runtime
h = 456789101112
i = 456789101111+1      #the parser knows at compile time that this evaluates to 456789101112
cur_dict = {1:a, 2:b, 3:c, 4:d, 5:e, 6:f, 7:g, 8:h, 9:i}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)
[{1, 2, 3}, {1, 2, 3}, (4, 5, 6), 23, 456789101112, 456789101112]