python在元组中对项进行分组,不重复
我有 我想python在元组中对项进行分组,不重复,python,Python,我有 我想 (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00')) 我见过你可以用列表来做这件事,但我没见过用卷发来做这件事。。这是可能的吗?使用它可能有点过分,但您可以: (('A','E, '1', 'UTC\xb100:00'),
(('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00'))
我见过你可以用列表来做这件事,但我没见过用卷发来做这件事。。这是可能的吗?使用它可能有点过分,但您可以:
(('A','E, '1', 'UTC\xb100:00'), ('B','D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00'))
正面是一个相当简洁的df.groupby('tz')
,反面是一个相当严重的依赖性(pandas加上它的依赖性)
人们可以将合并压缩成一行不太容易理解的行:
import pandas as pd
# somehow, pandas 0.12.0 does prefers
# a list of tuples rather than a tuple of tuples
t = [('A', '1', 'UTC\xb100:00'),
('B', '1', 'UTC+01:00'),
('C', '1', 'UTC+02:00'),
('D', '1', 'UTC+01:00'),
('E', '1', 'UTC\xb100:00'),
('F', '1', 'UTC+03:00')]
df = pd.DataFrame(t, columns=('letter', 'digit', 'tz'))
grouped = df.groupby('tz')
print(grouped.groups)
# {'UTC+01:00': [1, 3],
# 'UTC+02:00': [2],
# 'UTC+03:00': [5],
# 'UTC\xb100:00': [0, 4]}
merged = []
for key, vals in grouped.groups.iteritems():
update = [ t[idx][0] for idx in vals ] # add the letters
update += t[idx][1:] # add the digit and the TZ
merged.append(update)
print(merged)
# [['F', '1', 'UTC+03:00'], ['C', '1', 'UTC+02:00'], \
# ['A', 'E', '1', 'UTC\xb100:00'], ['B', 'D', '1', 'UTC+01:00']]
这可能有点过分,但您可以:
(('A','E, '1', 'UTC\xb100:00'), ('B','D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00'))
正面是一个相当简洁的df.groupby('tz')
,反面是一个相当严重的依赖性(pandas加上它的依赖性)
人们可以将合并压缩成一行不太容易理解的行:
import pandas as pd
# somehow, pandas 0.12.0 does prefers
# a list of tuples rather than a tuple of tuples
t = [('A', '1', 'UTC\xb100:00'),
('B', '1', 'UTC+01:00'),
('C', '1', 'UTC+02:00'),
('D', '1', 'UTC+01:00'),
('E', '1', 'UTC\xb100:00'),
('F', '1', 'UTC+03:00')]
df = pd.DataFrame(t, columns=('letter', 'digit', 'tz'))
grouped = df.groupby('tz')
print(grouped.groups)
# {'UTC+01:00': [1, 3],
# 'UTC+02:00': [2],
# 'UTC+03:00': [5],
# 'UTC\xb100:00': [0, 4]}
merged = []
for key, vals in grouped.groups.iteritems():
update = [ t[idx][0] for idx in vals ] # add the letters
update += t[idx][1:] # add the digit and the TZ
merged.append(update)
print(merged)
# [['F', '1', 'UTC+03:00'], ['C', '1', 'UTC+02:00'], \
# ['A', 'E', '1', 'UTC\xb100:00'], ['B', 'D', '1', 'UTC+01:00']]
您可以使用
groupby
,但需要先对输入进行排序,如下所示:
merged = [[t[idx][0] for idx in vs] + list(t[idx][1:])
for vs in grouped.groups.values()]
此代码打印:
from itertools import groupby
from operator import itemgetter
l = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00'))
result = []
key_items = itemgetter(1, 2)
for key, group in groupby(sorted(l, key=key_items), key=key_items):
item = []
item.extend([k[0] for k in group])
item.extend(key)
result.append(tuple(item))
print tuple(result)
我理解,它没有那么漂亮。您可以使用
groupby
,但您需要先对输入进行排序,如下所示:
merged = [[t[idx][0] for idx in vs] + list(t[idx][1:])
for vs in grouped.groups.values()]
此代码打印:
from itertools import groupby
from operator import itemgetter
l = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00'))
result = []
key_items = itemgetter(1, 2)
for key, group in groupby(sorted(l, key=key_items), key=key_items):
item = []
item.extend([k[0] for k in group])
item.extend(key)
result.append(tuple(item))
print tuple(result)
我理解,它并没有那么漂亮。你可以使用理解,但还是有点复杂
(('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00'))
总而言之:
tuples = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00'))
>>values = set(map(lambda x:x[1:3], tuples))
set([('1', 'UTC+03:00'), ('1', 'UTC\xb100:00'), ('1', 'UTC+01:00'), ('1', 'UTC+02:00')])
>>f = [[y[0] for y in tuples if y[1:3]==x] for x in values]
[['F'], ['A', 'E'], ['B', 'D'], ['C']]
>>r = zip((tuple(t) for t in f), values)
[(('F',), ('1', 'UTC+03:00')), (('A', 'E'), ('1', 'UTC\xb100:00')), (('B', 'D'), ('1', 'UTC+01:00')), (('C',), ('1', 'UTC+02:00'))]
>>result = tuple([sum(e, ()) for e in r])
(('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00'), ('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'))
你可以理解,但还是有点复杂
(('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00'))
总而言之:
tuples = (('A', '1', 'UTC\xb100:00'), ('B', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'), ('D', '1', 'UTC+01:00'), ('E', '1', 'UTC\xb100:00'), ('F', '1', 'UTC+03:00'))
>>values = set(map(lambda x:x[1:3], tuples))
set([('1', 'UTC+03:00'), ('1', 'UTC\xb100:00'), ('1', 'UTC+01:00'), ('1', 'UTC+02:00')])
>>f = [[y[0] for y in tuples if y[1:3]==x] for x in values]
[['F'], ['A', 'E'], ['B', 'D'], ['C']]
>>r = zip((tuple(t) for t in f), values)
[(('F',), ('1', 'UTC+03:00')), (('A', 'E'), ('1', 'UTC\xb100:00')), (('B', 'D'), ('1', 'UTC+01:00')), (('C',), ('1', 'UTC+02:00'))]
>>result = tuple([sum(e, ()) for e in r])
(('F', '1', 'UTC+03:00'), ('A', 'E', '1', 'UTC\xb100:00'), ('B', 'D', '1', 'UTC+01:00'), ('C', '1', 'UTC+02:00'))
使用元组时,不允许修改内容,但可以连接元组以获得其他元组
values = set(map(lambda x:x[1:3], tuples))
f = [[y[0] for y in tuples if y[1:3]==x] for x in values]
r = zip((tuple(t) for t in f), values)
result = tuple([sum(e, ()) for e in r])
在我看来,最终的结果似乎更多的是一个列表(逻辑上同质的内容)而不是一个元组,但是如果你真的需要一个元组,你可以
返回元组(res)
。对于元组,你不允许修改内容,但是你可以连接元组以获得其他元组
values = set(map(lambda x:x[1:3], tuples))
f = [[y[0] for y in tuples if y[1:3]==x] for x in values]
r = zip((tuple(t) for t in f), values)
result = tuple([sum(e, ()) for e in r])
在我看来,最终的结果似乎更多的是一个列表(逻辑上同质的内容)而不是一个元组,但是如果你真的需要一个元组,你可以
返回元组(res)
。如果你只关心具有相同代码的项目在同一元组中,那么这个答案是有效的:
def process(data):
res = []
for L in sorted(data, key=lambda x:x[2][-5:]):
if res and res[-1][2][-5:] == L[2][-5:]:
# Same group... do the merge
res[-1] = res[-1][:-2] + (L[0],) + res[-1][-2:]
else:
# Different group
res.append(L)
return res
如果您只关心具有相同代码的项是否位于同一元组中,则此答案有效:
def process(data):
res = []
for L in sorted(data, key=lambda x:x[2][-5:]):
if res and res[-1][2][-5:] == L[2][-5:]:
# Same group... do the merge
res[-1] = res[-1][:-2] + (L[0],) + res[-1][-2:]
else:
# Different group
res.append(L)
return res