Python 在带有条件的列表中删除词典_Python_Python 3.x

Python 在带有条件的列表中删除词典

python python-3.x

Python 在带有条件的列表中删除词典,python,python-3.x,Python,Python 3.x,我有下面的词典列表，我需要删除具有相同的接收值和客户组值的词典，但保留一个随机项 data = [ { 'id': '16e26a4a9f97fa4f', 'received_on': '2019-11-01 11:05:51', 'customer_group': 'Life-time Buyer' }, { 'id': '16db0dd4a42673e2', 'received_on':

我有下面的词典列表，我需要删除具有相同的接收值和客户组值的词典，但保留一个随机项

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]

预期产出：

[
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'

    }
]

这里有一种方法可以获得第一个唯一的datetime，如果您想要随机项，可以像中一样首先洗牌列表

输出：

[{'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51',
  'customer_group': 'Life-time Buyer'},
 {'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29',
  'customer_group': 'Lead'}]

[{'id': '16e26a4a9f97fa4f', 'received_on': '2019-11-01 11:05:51', 'customer_group': 'Life-time Buyer'}, {'id': '16db0dd4199f5897', 'received_on': '2019-10-09 14:12:29', 'customer_group': 'Lead'}]

这里有一种方法可以获得第一个唯一的datetime，如果您想要随机项，可以像中一样首先洗牌列表

输出：

[{'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51',
  'customer_group': 'Life-time Buyer'},
 {'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29',
  'customer_group': 'Lead'}]

[{'id': '16e26a4a9f97fa4f', 'received_on': '2019-11-01 11:05:51', 'customer_group': 'Life-time Buyer'}, {'id': '16db0dd4199f5897', 'received_on': '2019-10-09 14:12:29', 'customer_group': 'Lead'}]

我认为添加到目前为止尚未看到其接收的词典要比筛选出具有重复接收的词典容易：

result = []
receivedList = []
for d in data:
    if d['received_on'] not in receivedList:
        result.append(d)
        receivedList.append(d['received_on'])

print(result)
[{'customer_group': 'Life-time Buyer',
  'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51'},
 {'customer_group': 'Lead',
  'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29'}]

我认为添加到目前为止尚未看到其接收的词典要比筛选出具有重复接收的词典容易：

result = []
receivedList = []
for d in data:
    if d['received_on'] not in receivedList:
        result.append(d)
        receivedList.append(d['received_on'])

print(result)
[{'customer_group': 'Life-time Buyer',
  'id': '16e26a4a9f97fa4f',
  'received_on': '2019-11-01 11:05:51'},
 {'customer_group': 'Lead',
  'id': '16db0dd4a42673e2',
  'received_on': '2019-10-09 14:12:29'}]

这是在新数组中追加的更好方法

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]
unique_received = []
unique_customer_group = []
unique_data = []
for i in data:
    if i['customer_group'] not in unique_customer_group:
        if i['received_on'] not in unique_received:
            unique_data.append(i)
            unique_received.append(i['received_on'])
        unique_customer_group.append(i['customer_group'])

print(unique_data)

输出

[

    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51', 
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2', 
        'received_on': '2019-10-09 14:12:29', 
        'customer_group': 'Lead'
    }
]

这是在新数组中追加的更好方法

data = [
    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51',
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    },
    {
        'id': '16db0dd4199f5897',
        'received_on': '2019-10-09 14:12:29',
        'customer_group': 'Lead'
    }
]
unique_received = []
unique_customer_group = []
unique_data = []
for i in data:
    if i['customer_group'] not in unique_customer_group:
        if i['received_on'] not in unique_received:
            unique_data.append(i)
            unique_received.append(i['received_on'])
        unique_customer_group.append(i['customer_group'])

print(unique_data)

输出

[

    {
        'id': '16e26a4a9f97fa4f',
        'received_on': '2019-11-01 11:05:51', 
        'customer_group': 'Life-time Buyer'
    },
    {
        'id': '16db0dd4a42673e2', 
        'received_on': '2019-10-09 14:12:29', 
        'customer_group': 'Lead'
    }
]

利用上面的一些想法，我还想将

客户组

作为另一个条件，而不是

在

上收到。我得到了预期的结果

conditions, result = [], []
for d in data:
    condition = (d['received_on'], d['customer_group'])
    if condition not in conditions:
        result.append(d)
        conditions.append(condition)
print(len(result))

利用上面的一些想法，我还想将

客户组

作为另一个条件，而不是

在

上收到。我得到了预期的结果

conditions, result = [], []
for d in data:
    condition = (d['received_on'], d['customer_group'])
    if condition not in conditions:
        result.append(d)
        conditions.append(condition)
print(len(result))

您可以使用“按自定义键排序”，然后在返回的每个组上使用

对列表进行排序：

keyfunc = lambda x: (x['received_on'], x['customer_group'])
data.sort(key=keyfunc)

分组：

g = itertools.groupby(data, keyfunc)

选择随机元素需要将每个组迭代器转换为一个序列：

result = [random.choice(list(group)) for k, group in g]

通常，我会将键函数分开，特别是因为它使用了两次，并且只将最后两个步骤合并到

result = [random.choice(list(group)) for k, group in itertools.groupby(data, keyfunc)]

但是，您可以使用编写一个庞大、冗余的单行程序：

result = [random.choice(list(group)) for k, group in itertools.groupby(sorted(data, key=lambda x: (x['received_on'], x['customer_group'])), key=lambda x: (x['received_on'], x['customer_group']))]