Python 基于元素字段从列表中删除元素_Python_List_Tuples

Python 基于元素字段从列表中删除元素

python list

Python 基于元素字段从列表中删除元素,python,list,tuples,Python,List,Tuples,我有一个元组列表，其中每个元组有两个项；第一项是字典，第二项是字符串 all_values = [ ({'x1': 1, 'y1': 2}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x4': 1, 'y4': 2}, 'str1'), ] 我想根据元组

我有一个元组列表，其中每个元组有两个项；第一项是字典，第二项是字符串

all_values = [
              ({'x1': 1, 'y1': 2}, 'str1'), 
              ({'x2': 1, 'y2': 2}, 'str2'), 
              ({'x3': 1, 'y3': 2}, 'str3'),
              ({'x4': 1, 'y4': 2}, 'str1'),
             ]

我想根据元组的第二项从列表中删除重复数据。我写了这段代码，但我想改进它：

flag = False
items = []
for index, item in enumerate(all_values):
    for j in range(0, index):
        if all_values[j][1] == all_values[index][1]:
            flag = True
    if not flag:
        items.append(item)
    flag = False

得到这个：

items = [
         ({'x1': 1, 'y1': 2}, 'str1'), 
         ({'x2': 1, 'y2': 2}, 'str2'), 
         ({'x3': 1, 'y3': 2}, 'str3')
        ]

有什么帮助吗

顺便说一句，我试图使用

列表（set（all_值））

删除重复数据，但我得到错误

不可损坏类型：dict

您可以使用以下代码

items = []
for item in all_values:
    if next((i for i in items if i[1] == item[1]), None) is None:
        items.append(item)

面向对象的方法并不是更短，而是更直观、可读/可维护（IMHO）

首先创建一个模仿元组的对象，并提供附加的

hash（）

和

eq（）

函数，稍后

Set

将使用这些函数来检查对象的唯一性

函数

\uuuu repr\uuuu（）

声明用于调试：

class tup(object):
    def __init__(self, t):
        self.t = t

    def __eq__(self, other):
        return self.t[1] == other.t[1]

    def __hash__(self):
        return hash(self.t[1])

    def __repr__(self):
        return str(t)

# now you can declare:
all_values = [
              ({'x1': 1, 'y1': 2}, 'str1'), 
              ({'x2': 1, 'y2': 2}, 'str2'), 
              ({'x2': 1, 'y2': 2}, 'str2'), 
              ({'x3': 1, 'y3': 2}, 'str3'),
              ({'x3': 1, 'y3': 2}, 'str3')
             ]

#create your objects and put them in a list             
all_vals = []
map(lambda x: all_vals.append(Tup(x)), all_values)   


print all_vals  # [({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')]  

# and use Set for uniqueness 
from sets import Set
print Set(all_vals) # Set([({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')])

对于那些认为尺寸重要的人来说，另一个较短的版本；）

使用另一个列表（“字符串”）收集元组的第二个字符串项。因此，您将有一个明确的方法来检查当前列表项是否重复

在下面的代码中，我添加了一个重复的列表项（带有'str2'值），用于演示

all_values = [
              ({'x1': 1, 'y1': 2}, 'str1'),
              ({'x2': 1, 'y2': 2}, 'str2'),
              ({'x5': 8, 'ab': 7}, 'str2'),
              ({'x3': 1, 'y3': 2}, 'str3')
             ]

strings = []
result = []
for value in all_values:
    if not value[1] in strings:
        strings.append(value[1])
        result.append(value)

新的非重复列表将显示在“结果”中。

如果您不关心排序，请使用

dict

formattedValues = {}
# Use with reveresed if you want the first duplicate to be kept
# Use without reveresed if you want the last duplicated
for v in reversed(allValues):
    formattedValues[ v[1] ] = v

from collections import OrderedDict
formattedValues = OrderedDict()
for v in reversed(allValues):
    formattedValues[ v[1] ] = v

如果需要订购，请使用

OrderedDict

formattedValues = {}
# Use with reveresed if you want the first duplicate to be kept
# Use without reveresed if you want the last duplicated
for v in reversed(allValues):
    formattedValues[ v[1] ] = v

from collections import OrderedDict
formattedValues = OrderedDict()
for v in reversed(allValues):
    formattedValues[ v[1] ] = v

有一个函数unique_everseen，它将根据传入键返回传入iterable中唯一项的迭代器，如果您想要一个列表作为结果，只需将结果传递给list（），但是如果有大量数据，如果可以节省内存，最好只进行迭代

from itertools import ifilterfalse
from operator import itemgetter

all_values = [
    ({'x1': 1, 'y1': 2}, 'str1'),
    ({'x2': 1, 'y2': 2}, 'str2'),
    ({'x5': 8, 'ab': 7}, 'str2'),
    ({'x3': 1, 'y3': 2}, 'str3')]


def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in ifilterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

print list(unique_everseen(all_values, itemgetter(1)))

输出

[({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3')]

tuple中的第二项是str1、str2和str3。它们已经是独一无二的了。你能澄清一下吗？此外，示例输入和预期输出列表也很有用。@Marcin这是一个示例。在现实世界中，有一个很大的列表有一些重复。在我看来，这似乎有些过头了。一个简短/局部的答案也有助于可读性。当然，你的答案会更具可扩展性。@mogambo overkill？大概类定义只有9行长。加载对象是另一行，使用

Set

需要两行。您在这里看到的一半代码是打印、注释和原始结构。12行代码还不错（同样是IMO）；）对。但是与我的代码（作为一个例子&无耻的plug:P）2行相比，类定义肯定太多了！不能小于4行。不管怎样，我并没有说这是错的，只是觉得我应该把我的2c放进去：）@mogambo给你，4行（不包括最后的打印），但是现在你没有使用你的类定义，或者我只是遗漏了什么？如果你不是，我可以认为你同意我的观点，类定义是过分的；）