Python 如何按半唯一值筛选列表
我有一个数据集,需要对其进行“唯一”筛选。基本上,我想删除同一用户每天不止一次购买同一产品的每一行,而不考虑可变设备。在多次出现的情况下,我希望只保留第一行 数据:Python 如何按半唯一值筛选列表,python,Python,我有一个数据集,需要对其进行“唯一”筛选。基本上,我想删除同一用户每天不止一次购买同一产品的每一行,而不考虑可变设备。在多次出现的情况下,我希望只保留第一行 数据: datetime, device, product, user [ ['2013-07-08 15:00:00', 'pc', 'X', 'A'], ['2013-07-09 17:00:00', 'pc', 'X', 'A'], ['2013-07-09 10:
datetime, device, product, user
[
['2013-07-08 15:00:00', 'pc', 'X', 'A'],
['2013-07-09 17:00:00', 'pc', 'X', 'A'],
['2013-07-09 10:00:00', 'andr', 'Y', 'B'],
['2013-07-10 18:00:00', 'pc', 'Y', 'B'],
['2013-07-10 21:00:00', 'ipho', 'Y', 'B'], <- second occurance of B getting Y that day
['2013-07-10 22:00:00', 'andr', 'Y', 'B'], <- third occurance of B getting Y that day
['2013-07-10 02:00:00', 'ipho', 'Z', 'C'],
['2013-07-10 11:00:00', 'pc', 'Z', 'C'] <- second occurance of C getting Z that day
]
我该怎么做呢?从日期时间中去掉时间部分,然后将每个项目存储在字典中(如果还没有)。作为字典的键,使用日期、产品、用户的元组 例如
从datetime中去掉时间部分,然后将每个项存储在字典中(如果尚未存储)。作为字典的键,使用日期、产品、用户的元组 例如
['2013-07-08 15:00:00', 'pc', 'X', 'A'],
['2013-07-09 17:00:00', 'pc', 'X', 'A'],
['2013-07-09 10:00:00', 'andr', 'Y', 'B'],
['2013-07-10 18:00:00', 'pc', 'Y', 'B'],
['2013-07-10 02:00:00', 'ipho', 'Z', 'C'],
['2013-07-10 11:00:00', 'pc', 'Z', 'C']
d = {}
for datetime, device, product, user in table:
date = datetime[:10]
if (date, product, user) not in d:
d[(date, product, user)] = [datetime, device, product, user]