Python 比较namedtuple列表中的几个（但不是所有）元素_Python_List_Python 3.x_Namedtuple

Python 比较namedtuple列表中的几个（但不是所有）元素

python list python-3.x

Python 比较namedtuple列表中的几个（但不是所有）元素,python,list,python-3.x,namedtuple,Python,List,Python 3.x,Namedtuple,我有一个名为tuple的列表，它可能相当长（目前可以达到10000行，但将来可能会更多）我需要将每个namedtuple的几个元素与列表中的所有其他namedtuple进行比较。我正在寻找一种有效且通用的方法来做到这一点为了简单起见，我将用蛋糕做一个类比，这将使理解问题变得更容易有一个名元组列表，其中每个名元组都是一块蛋糕： Cake = namedtuple('Cake', ['cake_id',

我有一个名为tuple的列表，它可能相当长（目前可以达到10000行，但将来可能会更多）

我需要将每个namedtuple的几个元素与列表中的所有其他namedtuple进行比较。我正在寻找一种有效且通用的方法来做到这一点

为了简单起见，我将用蛋糕做一个类比，这将使理解问题变得更容易

有一个名元组列表，其中每个名元组都是一块蛋糕：

Cake = namedtuple('Cake', 
                       ['cake_id',
                        'ingredient1', 'ingredient2', 'ingredient3',
                        'baking_time', 'cake_price']
                 )

蛋糕价格和烘焙时间都很重要。如果蛋糕的成分相同，我想从列表中删除那些不相关的成分。因此，任何蛋糕（使用相同的配料）都是同等或更昂贵的，并且需要相同或更长的时间来烘焙，这是不相关的（下面有一个详细的例子）

最好的方法是什么

方法到目前为止，我所做的是按照

cake\u price

和

baking\u time

对命名元组列表进行排序：

sorted_cakes = sorted(list_of_cakes, key=lambda c: (c.cake_price, c.baking_time))

然后创建一个新的列表，我添加所有的蛋糕，只要之前添加的蛋糕没有相同的成分，就可以更便宜、更快地烘焙

list_of_good_cakes = []
    for cake in sorted_cakes:
        if interesting_cake(cake, list_of_good_cakes):
            list_of_good_cakes.append(cake)

def interesting_cake(current_cake, list_of_good_cakes):
    is_interesting = True
    if list_of_good_cakes: #first cake to be directly appended
        for included_cake in list_of_good_cakes:
            if (current_cake.ingredient1 == included_cake.ingredient1 and
                current_cake.ingredient2 == included_cake.ingredient2 and
                current_cake.ingredient3 == included_cake.ingredient3 and
                current_cake.baking_time >= included_cake.baking_time):

                if current_cake.cake_price >= included_cake.cake_price:
                    is_interesting = False

    return is_interesting

（我知道嵌套循环远不是最优的，但我想不出任何其他方法来实现它…）

例子：拥有

list_of_cakes = [cake_1, cake_2, cake_3, cake_4, cake_5]

在哪里

预期结果将是：

list_of_relevant_cakes = [cake_1, cake_3, cake_4, cake_5]

蛋糕1是世界上最便宜的（也是同一价格中最快的）
cake_2的价格与cake1相同，烘焙时间更长
cake_3是另一种蛋糕-->在
蛋糕4比蛋糕1贵，但烘焙速度更快
cake_5比cake_1和cake_4更贵，但烘焙速度更快

方法的运行时间大致与

len(list_of_cakes) * len(list_of_relevant_cakes)

。。。如果你有很多蛋糕，而且很多蛋糕都是相关的，那么蛋糕可能会变得很大

我们可以利用这样一个事实来改进这一点，即每一组具有相同成分的蛋糕可能要小得多。首先，我们需要一个功能来检查一个新蛋糕与一个现有的、已经优化的、具有相同成分的集群：

from copy import copy

def update_cluster(cakes, new):
    for c in copy(cakes):
        if c.baking_time <= new.baking_time and c.cake_price <= new.cake_price:
            break
        elif c.baking_time >= new.baking_time and c.cake_price >= new.cake_price:
            cakes.discard(c)
    else:
        cakes.add(new)

这就是它的作用：

>>> select_from(list_of_cakes)
[Cake(cake_id=1, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=60, cake_price=20),
 Cake(cake_id=4, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=40, cake_price=30),
 Cake(cake_id=5, ingredient1='dark chocolate', ingredient2='cookies', ingredient3='strawberries', baking_time=10, cake_price=80),
 Cake(cake_id=3, ingredient1='white chocolate', ingredient2='bananas', ingredient3='strawberries', baking_time=150, cake_price=100)]

此解决方案的运行时间大致与

len(list_of_cakes) * len(typical_cluster_size)

我做了一个随机蛋糕列表的小测试，每个都从你的五种不同原料中选择，随机价格和烘焙时间，然后

此方法始终产生与您相同的结果（尽管未分类）

它的运行速度相当快——在我的机器上，100000个随机蛋糕的运行时间为0.2秒，而你的大约为3秒

未经测试的代码，但应该有助于指出更好的方法：

equivalence_fields = operator.attrgetter('ingredient1', 'ingredient2', 'ingrediant3')
relevant_fields = operator.attrgetter('baking_time', 'cake_price')

def irrelevent(cake1, cake2):
    """cake1 is irrelevant if it is both
       more expensive and takes longer to bake.
    """
    return cake1.cake_price > cake2.cake_price and cake1.baking_time > cake2.bake_time

# Group equivalent cakes together
equivalent_cakes = collections.defaultdict(list)
for cake in cakes:
    feature = equivalence_fields(cake)
    equivalent_cakes[feature].append(cake)

# Weed-out irrelevant cakes within an equivalence class
for feature, group equivalent_cakes.items():
    best = min(group, key=relevant_fields)
    group[:] = [cake for cake in group if not irrelevant(cake, best)]

明亮的我根据我的实际情况修改了它。用5330个名为tuples的列表进行测试，差别是巨大的。之前的运行时间：

25.2s

，

14.1s

，

14.8s

；以下时间后的运行时间：

0.04s

，

0.2s

，

0.04s

。只有一个让我困惑的问题：

update\u集群中的else
函数是如何工作的？它没有与if
子句相同的缩进，因此在开始时，我认为它是一个打字错误。然后我意识到结果没有被正确计算，除非else
在你写它的时候缩进了…很高兴它有帮助：-）update_cluster（）
中的else
附加到for
，而不是if
。。。else

构造的

文档是，一篇很好的解释性文章是。基本上，如果未触发中断，它将运行。
len(list_of_cakes) * len(typical_cluster_size)

equivalence_fields = operator.attrgetter('ingredient1', 'ingredient2', 'ingrediant3')
relevant_fields = operator.attrgetter('baking_time', 'cake_price')

def irrelevent(cake1, cake2):
    """cake1 is irrelevant if it is both
       more expensive and takes longer to bake.
    """
    return cake1.cake_price > cake2.cake_price and cake1.baking_time > cake2.bake_time

# Group equivalent cakes together
equivalent_cakes = collections.defaultdict(list)
for cake in cakes:
    feature = equivalence_fields(cake)
    equivalent_cakes[feature].append(cake)

# Weed-out irrelevant cakes within an equivalence class
for feature, group equivalent_cakes.items():
    best = min(group, key=relevant_fields)
    group[:] = [cake for cake in group if not irrelevant(cake, best)]