Python 比较namedtuple列表中的几个(但不是所有)元素
我有一个名为tuple的列表,它可能相当长(目前可以达到10000行,但将来可能会更多) 我需要将每个namedtuple的几个元素与列表中的所有其他namedtuple进行比较。我正在寻找一种有效且通用的方法来做到这一点 为了简单起见,我将用蛋糕做一个类比,这将使理解问题变得更容易 有一个名元组列表,其中每个名元组都是一块蛋糕:Python 比较namedtuple列表中的几个(但不是所有)元素,python,list,python-3.x,namedtuple,Python,List,Python 3.x,Namedtuple,我有一个名为tuple的列表,它可能相当长(目前可以达到10000行,但将来可能会更多) 我需要将每个namedtuple的几个元素与列表中的所有其他namedtuple进行比较。我正在寻找一种有效且通用的方法来做到这一点 为了简单起见,我将用蛋糕做一个类比,这将使理解问题变得更容易 有一个名元组列表,其中每个名元组都是一块蛋糕: Cake = namedtuple('Cake', ['cake_id',
Cake = namedtuple('Cake',
['cake_id',
'ingredient1', 'ingredient2', 'ingredient3',
'baking_time', 'cake_price']
)
蛋糕价格和烘焙时间都很重要。如果蛋糕的成分相同,我想从列表中删除那些不相关的成分。因此,任何蛋糕(使用相同的配料)都是同等或更昂贵的,并且需要相同或更长的时间来烘焙,这是不相关的(下面有一个详细的例子)
最好的方法是什么
方法 到目前为止,我所做的是按照
cake\u price
和baking\u time
对命名元组列表进行排序:
sorted_cakes = sorted(list_of_cakes, key=lambda c: (c.cake_price, c.baking_time))
然后创建一个新的列表,我添加所有的蛋糕,只要之前添加的蛋糕没有相同的成分,就可以更便宜、更快地烘焙
list_of_good_cakes = []
for cake in sorted_cakes:
if interesting_cake(cake, list_of_good_cakes):
list_of_good_cakes.append(cake)
def interesting_cake(current_cake, list_of_good_cakes):
is_interesting = True
if list_of_good_cakes: #first cake to be directly appended
for included_cake in list_of_good_cakes:
if (current_cake.ingredient1 == included_cake.ingredient1 and
current_cake.ingredient2 == included_cake.ingredient2 and
current_cake.ingredient3 == included_cake.ingredient3 and
current_cake.baking_time >= included_cake.baking_time):
if current_cake.cake_price >= included_cake.cake_price:
is_interesting = False
return is_interesting
(我知道嵌套循环远不是最优的,但我想不出任何其他方法来实现它…)
例子: 拥有
list_of_cakes = [cake_1, cake_2, cake_3, cake_4, cake_5]
在哪里
预期结果将是:
list_of_relevant_cakes = [cake_1, cake_3, cake_4, cake_5]
- 蛋糕1是世界上最便宜的(也是同一价格中最快的)
- cake_2的价格与cake1相同,烘焙时间更长
- cake_3是另一种蛋糕-->在
- 蛋糕4比蛋糕1贵,但烘焙速度更快
- cake_5比cake_1和cake_4更贵,但烘焙速度更快
len(list_of_cakes) * len(list_of_relevant_cakes)
。。。如果你有很多蛋糕,而且很多蛋糕都是相关的,那么蛋糕可能会变得很大
我们可以利用这样一个事实来改进这一点,即每一组具有相同成分的蛋糕可能要小得多。首先,我们需要一个功能来检查一个新蛋糕与一个现有的、已经优化的、具有相同成分的集群:
from copy import copy
def update_cluster(cakes, new):
for c in copy(cakes):
if c.baking_time <= new.baking_time and c.cake_price <= new.cake_price:
break
elif c.baking_time >= new.baking_time and c.cake_price >= new.cake_price:
cakes.discard(c)
else:
cakes.add(new)
这就是它的作用:
>>> select_from(list_of_cakes)
[Cake(cake_id=1, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=60, cake_price=20),
Cake(cake_id=4, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=40, cake_price=30),
Cake(cake_id=5, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=10, cake_price=80),
Cake(cake_id=3, ingredient1='white chocolate', ingredient2='bananas', ingredient3='strawberries', baking_time=150, cake_price=100)]
此解决方案的运行时间大致与
len(list_of_cakes) * len(typical_cluster_size)
我做了一个随机蛋糕列表的小测试,每个都从你的五种不同原料中选择,随机价格和烘焙时间,然后
未经测试的代码,但应该有助于指出更好的方法:
equivalence_fields = operator.attrgetter('ingredient1', 'ingredient2', 'ingrediant3')
relevant_fields = operator.attrgetter('baking_time', 'cake_price')
def irrelevent(cake1, cake2):
"""cake1 is irrelevant if it is both
more expensive and takes longer to bake.
"""
return cake1.cake_price > cake2.cake_price and cake1.baking_time > cake2.bake_time
# Group equivalent cakes together
equivalent_cakes = collections.defaultdict(list)
for cake in cakes:
feature = equivalence_fields(cake)
equivalent_cakes[feature].append(cake)
# Weed-out irrelevant cakes within an equivalence class
for feature, group equivalent_cakes.items():
best = min(group, key=relevant_fields)
group[:] = [cake for cake in group if not irrelevant(cake, best)]
明亮的我根据我的实际情况修改了它。用5330个名为tuples的列表进行测试,差别是巨大的。之前的运行时间:
25.2s
,14.1s
,14.8s
;以下时间后的运行时间:0.04s
,0.2s
,0.04s
。只有一个让我困惑的问题:update\u集群中的else
函数是如何工作的?它没有与if
子句相同的缩进,因此在开始时,我认为它是一个打字错误。然后我意识到结果没有被正确计算,除非else
在你写它的时候缩进了…很高兴它有帮助:-)update_cluster()
中的else
附加到for
,而不是if
。。。else
构造的文档是,一篇很好的解释性文章是。基本上,如果未触发中断
,它将运行。
len(list_of_cakes) * len(typical_cluster_size)
equivalence_fields = operator.attrgetter('ingredient1', 'ingredient2', 'ingrediant3')
relevant_fields = operator.attrgetter('baking_time', 'cake_price')
def irrelevent(cake1, cake2):
"""cake1 is irrelevant if it is both
more expensive and takes longer to bake.
"""
return cake1.cake_price > cake2.cake_price and cake1.baking_time > cake2.bake_time
# Group equivalent cakes together
equivalent_cakes = collections.defaultdict(list)
for cake in cakes:
feature = equivalence_fields(cake)
equivalent_cakes[feature].append(cake)
# Weed-out irrelevant cakes within an equivalence class
for feature, group equivalent_cakes.items():
best = min(group, key=relevant_fields)
group[:] = [cake for cake in group if not irrelevant(cake, best)]