在python中消除对象列表的重复_Python

在python中消除对象列表的重复

python

在python中消除对象列表的重复,python,Python,如何在python中消除对象列表的重复，例如 list\u of_objects[i]是list\u of_objects[j]返回true当且仅当i==j 例如：我有两组数字，我构建了一个字典，数字作为键，值作为簇 a = {1,2,3} b = {4,5,6} cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b} duplicated_clusters = list(cur_dict.values()) duplicated_clusters # [{1, 2

如何在python中消除对象列表的重复，例如

list\u of_objects[i]是list\u of_objects[j]

true

当且仅当

i==j

例如：

我有两组数字，我构建了一个字典，数字作为键，值作为簇

a = {1,2,3}
b = {4,5,6}
cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b}
duplicated_clusters = list(cur_dict.values())
duplicated_clusters
# [{1, 2, 3}, {1, 2, 3}, {1, 2, 3}, {4, 5, 6}, {4, 5, 6}, {4, 5, 6}]
# How to process duplicated_clusters to get [{1, 2, 3}, {4, 5, 6}]?

# Obviously set(duplicated_clusters) is not working because set is not hashable and mutable.

由于python中没有指针，如何获得消除重复的对象列表（或者是无法实现的）？（我可以想出一些解决方法，但这对我来说并不直接，比如使用额外的标识符或将每个对象包装到包装器类中）

例2：

由于@wjandrea的评论，添加了一个更清晰的示例

a = {1,2,3}
b = {4,5,6}
c = {1,2,3}
duplicated_clusters = [a,a,b,b,c,c]
duplicated_clusters
# [{1, 2, 3}, {1, 2, 3}, {4, 5, 6}, {4, 5, 6}, {1, 2, 3}, {1, 2, 3}]
# Deduplicated clusters I want to obtain: [{1, 2, 3}, {4, 5, 6}, {1, 2, 3}], equivalent to [a,b,c]

该函数返回一个值，该值在对象的生存期内是唯一且恒定的。可以将其用作标识重复对象的键

a = {1,2,3}
b = {4,5,6}
cur_dict = {1:a, 2:a, 3:a, 4:b, 5:b, 6:b}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)

结果:

[{1, 2, 3}, {4, 5, 6}]

“python中没有指针”基本上是正确的。在CPython中，

id

返回内存中对象的地址，因此它实际上是指向该对象的指针。但这种方法甚至适用于

id

与内存地址无关的更奇特的实现。只要

a是b

意味着

id（a）==id（b）

反之亦然，那么这种方法应该消除引用重复

。。。尽管如此，请记住Python经常“实习生”某些类型的内置值，因此您认为在引用上是唯一的对象实际上可能是同一个对象。考虑这个例子：

a = {1,2,3}
b = {1,2,3}
c = (4,5,6)
d = (4,5,6)
e = int("23")           #the parser doesn't know what value this will be until runtime
f = 23
g = int("456789101112") #the parser doesn't know what value this will be until runtime
h = 456789101112
i = 456789101111+1      #the parser knows at compile time that this evaluates to 456789101112
cur_dict = {1:a, 2:b, 3:c, 4:d, 5:e, 6:f, 7:g, 8:h, 9:i}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)

结果（在CPython中）：

集合是可变的，所以它们永远不会被拘留。元组是不可变的，所以它们可以被拘留。小整数是固定的，即使您特意创建它们，使解析器在编译时无法猜测其值。大整数通常不被内定，尽管如果两个大整数值是使用算术表达式创建的，并且可以在编译时优化为一个常量，那么它们在引用上仍然是相同的。

不太可能，我要问的是关于消除对象标识的重复。例如，在我的例子中，

{1,2,3}是{1,2,3}

等于

false

a = {1,2,3}
b = {1,2,3}
c = (4,5,6)
d = (4,5,6)
e = int("23")           #the parser doesn't know what value this will be until runtime
f = 23
g = int("456789101112") #the parser doesn't know what value this will be until runtime
h = 456789101112
i = 456789101111+1      #the parser knows at compile time that this evaluates to 456789101112
cur_dict = {1:a, 2:b, 3:c, 4:d, 5:e, 6:f, 7:g, 8:h, 9:i}
duplicated_clusters = list(cur_dict.values())
result = list({id(x): x for x in duplicated_clusters}.values())
print(result)

[{1, 2, 3}, {1, 2, 3}, (4, 5, 6), 23, 456789101112, 456789101112]