Python：对依赖项列表进行排序_Python_Sorting_Topological Sort

Python：对依赖项列表进行排序

python sorting

Python：对依赖项列表进行排序,python,sorting,topological-sort,Python,Sorting,Topological Sort,我正在尝试使用内置的sorted（）函数来解决我的问题，或者我需要自己解决问题——使用cmp的老派方法相对容易我的数据集如下所示： x = [ ('business', Set('fleet','address')) ('device', Set('business','model','status','pack')) ('txn', Set('device','business','operator')) .... x=[ （'business'，Set（'fleet'，'address'）

我正在尝试使用内置的sorted（）函数来解决我的问题，或者我需要自己解决问题——使用cmp的老派方法相对容易

我的数据集如下所示：

x = [ ('business', Set('fleet','address')) ('device', Set('business','model','status','pack')) ('txn', Set('device','business','operator')) .... x=[ （'business'，Set（'fleet'，'address'））（'device'，Set（'business'，'model'，'status'，'pack'））（'txn'，Set（'device'，'business'，'operator'）） .... 排序规则基本上适用于所有N&Y值，其中Y>N，x[N][0]不在x[Y][1]

虽然我使用的是Python2.6，其中cmp参数仍然可用，但我正在努力使Python3安全

那么，这可以用lambda魔法和关键参数来实现吗

-==更新==-

谢谢Eli&Winston！我真的不认为使用钥匙会起作用，或者如果可以的话，我怀疑这是一个不理想的鞋角解决方案

因为我的问题是数据库表依赖关系，所以我不得不对Eli的代码做一个小的添加，以从依赖关系列表中删除一项（在设计良好的数据库中，这不会发生，但谁生活在神奇的完美世界中？）

我的解决方案：

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted

def拓扑_排序（来源）：
“”“对元素执行拓扑排序。”。
：arg源：``（名称、集合（依赖项名称））``对的列表
：返回：名称列表，首先列出依赖项
"""
pending=[（名称，集合（deps））表示名称，源中的deps]
发射=[]
待决期间：
next_pending=[]
下一步发射=[]
待处理的条目：
名称，deps=条目
deps.difference_update（set（（name，）），emissed）#覆盖了糟糕的格式和这个奇怪的set
类型…（我将它们保留为元组，并正确地分隔了列表项…）并使用networkx
库使事情变得方便
x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)

虽然可以使用内置的sort（）
实现，但这相当棘手，最好直接在python中实现拓扑排序
为什么会很尴尬？如果你在wiki页面上研究这两种算法，它们都依赖于一组正在运行的“标记节点”，这是一个很难扭曲成形式的概念，因为sort（）
可以使用，因为key=xxx
（甚至cmp=xxx
）最适用于无状态比较函数，特别是因为timsort不能保证元素的检查顺序。我（相当）确定任何使用sort（）的解决方案
最终将为每次调用key/cmp函数冗余计算一些信息，以避免无状态问题
以下是我一直在使用的alg（对一些javascript库依赖项进行排序）：
编辑：根据Winston Ewert的解决方案对其进行了重大修改
def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted

def拓扑_排序（来源）：
“”“对元素执行拓扑排序。”。
：arg source:list of``（name，[依赖项列表]）``对
：返回：名称列表，首先列出依赖项
"""
pending=[（name，set（deps））for name，deps in source]#复制deps以便我们可以就地修改set
发射=[]
待决期间：
next_pending=[]
下一步发射=[]
待处理的条目：
名称，deps=条目
deps.difference_update（已发出）#删除上次发出的deps
如果deps:#仍有deps？下次通过时重新检查
下一个\u挂起。追加（条目）
否则：#没有更多的deps？该发射了
屈服名称
expndtw-1.append（name）#我执行如下拓扑排序：
def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我认为这比Eli的版本简单一点，我不知道效率。
这是Winston的建议，带有一个docstring和一个微小的调整，反转了依赖项。issubset（提供）
和提供。issupset（依赖项）
。此更改允许您将每个输入对中的依赖项作为任意可数传递，而不必作为集

我的用例涉及一个dict
，其键是项字符串，每个键的值都是该键所依赖的项名称的列表。一旦我确定dict
非空，我就可以将其iteritems（）
传递给修改后的算法
再次感谢温斯顿
def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

@JonClements我想他指的是sets.Set
，尽管即使在他说他正在使用的Python 2.6中也不赞成这种说法。但是，如果这就是他的意思，那么他需要为构造函数提供一个iterable，而不是多个参数。是的，sets.Set（）。我支持Python 2.3-3.1环境和“Set”（'item1'，'item2'）是python解释器为包含字符串和集合的元组列表打印的内容（复制的输出，而不是如何创建的代码）。使用集合，因为如果添加了重复项，它们会忽略重复项，这会有所帮助，因为用于创建我的数据集的输入非常难看…这比我的版本简单得多。我认为您的主要效率接收器是issubset（）调用，但这只会是大型数据集的一个问题——我的版本受到初始设置成本的阻碍，对于小型数据集来说，初始设置成本较慢，但它可以在有很多依赖项的情况下修改设置以避免出现问题集。尽管如此，您的基本结构仍然更好，我希望您不介意我重新修改了implem我想借用你的一些解决方案：）这是唯一对我有效的解决方案……我不喜欢对其他LIB的依赖性，但值得对这个解决方案提出一个小小的codeOne警告；它只在依赖性形成一个完全连接的图时有效。如果存在没有任何依赖性的节点（因此没有任何到其他节点的边），它们将不包括在拓扑\u sort（）
的输出中。