在python中，有没有比标准的；递归的；？_Python_Tree_Subtree

在python中，有没有比标准的；递归的；？

python tree

在python中，有没有比标准的；递归的；？,python,tree,subtree,Python,Tree,Subtree,假设以下数据结构包含三个numpy数组（id，parent_id）（根元素的parent_id为-1）：对于许多ID（>1000）而言，这种获取速度非常慢。有没有更快的方法来实现这一点？理论上，每个算法都可以迭代编写，也可以递归编写。但这是一个谬论（就像图灵完备性一样）。在实践中，通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方（至少你正在修改子树）。在数千个元素上执行x从本质上讲是非常昂贵的，不管您是迭代还是递归地执行。在具体的实现上，最多可能有一些微优化，这在理论上最多

假设以下数据结构包含三个numpy数组（id，parent_id）（根元素的parent_id为-1）：

对于许多ID（>1000）而言，这种获取速度非常慢。有没有更快的方法来实现这一点？

理论上，每个算法都可以迭代编写，也可以递归编写。但这是一个谬论（就像图灵完备性一样）。在实践中，通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方（至少你正在修改子树）。在数千个元素上执行x从本质上讲是非常昂贵的，不管您是迭代还是递归地执行。在具体的实现上，最多可能有一些微优化，这在理论上最多会产生，每个算法都可以迭代编写，也可以递归编写。但这是一个谬论（就像图灵完备性一样）。在实践中，通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方（至少你正在修改子树）。在数千个元素上执行x从本质上讲是非常昂贵的，不管您是迭代还是递归地执行。在具体的实现上，最多可以进行一些微优化，这将最多产生如果您使用的是Python2.6，您是否尝试过使用psyco模块？它有时可以大大加快代码的速度

您是否考虑过递归数据结构：列表

您的示例也是标准列表：

[1、[3、[4]、[5]]

或

[1，[2，无，无]，[3，[4，无，无]，[5，无，无]]

本人谨此陈辞：

子树已经准备好了，需要花费一些时间将值插入到正确的树中。也值得一试，看看是否适合你的需要

此外，Guido自己也提供了一些关于遍历和中的树的见解，也许您已经意识到了这一点

下面是一些高级的树元素，实际上是为Python提出的基本列表类型替换，但在该函数中被拒绝

如果您使用的是Python 2.6，您是否尝试过使用psyco模块？它有时可以大大加快代码的速度

您是否考虑过递归数据结构：列表

您的示例也是标准列表：

[1、[3、[4]、[5]]

或

[1，[2，无，无]，[3，[4，无，无]，[5，无，无]]

本人谨此陈辞：

子树已经准备好了，需要花费一些时间将值插入到正确的树中。也值得一试，看看是否适合你的需要

此外，Guido自己也提供了一些关于遍历和中的树的见解，也许您已经意识到了这一点

下面是一些高级的树元素，实际上是为Python提出的基本列表类型替换，但在该函数中被拒绝

这是我的答案（在没有访问类的情况下编写，因此接口稍有不同，但我将按原样附加它，以便您可以测试它是否足够快）：
===============================文件图形\u array.py==========================


import collections
import numpy

def find_subtree(pids, subtree_id):
    N = len(pids)
    assert 1 <= subtree_id <= N

    subtreeids = numpy.zeros(pids.shape, dtype=bool)
    todo = collections.deque([subtree_id])

    iter = 0
    while todo:
        id = todo.popleft()
        assert 1 <= id <= N
        subtreeids[id - 1] = True

        sons = (pids == id).nonzero()[0] + 1
        #print 'id={0} sons={1} todo={2}'.format(id, sons, todo)
        todo.extend(sons)

        iter = iter+1
        if iter>N:
            raise ValueError()

    return subtreeids


import numpy
from graph_array import find_subtree

def _random_graph(n, maxsons):
    import random
    pids = numpy.zeros(n, dtype=int)
    sons = numpy.zeros(n, dtype=int)
    available = []
    for id in xrange(1, n+1):
        if available:
            pid = random.choice(available)

            sons[pid - 1] += 1
            if sons[pid - 1] == maxsons:
                available.remove(pid)
        else:
            pid = -1
        pids[id - 1] = pid
        available.append(id)
    assert sons.max() <= maxsons
    return pids

def verify_subtree(pids, subtree_id, subtree):
    ids = set(subtree.nonzero()[0] + 1)
    sons = set(ids) - set([subtree_id])
    fathers = set(pids[id - 1] for id in sons)
    leafs = set(id for id in ids if not (pids == id).any())
    rest = set(xrange(1, pids.size+1)) - fathers - leafs
    assert fathers & leafs == set()
    assert fathers | leafs == ids
    assert ids & rest == set()

def test_linear_graph_gen(n, genfunc, maxsons):
    assert maxsons == 1
    pids = genfunc(n, maxsons)

    last = -1
    seen = set()
    for _ in xrange(pids.size):
        id = int((pids == last).nonzero()[0]) + 1
        assert id not in seen
        seen.add(id)
        last = id
    assert seen == set(xrange(1, pids.size + 1))

def test_case1():
    """
            1
           / \
          2   4
         /
        3
    """
    pids = numpy.array([-1, 1, 2, 1])

    subtrees = {1: [True, True, True, True],
                2: [False, True, True, False],
                3: [False, False, True, False],
                4: [False, False, False, True]}

    for id in xrange(1, 5):
        sub = find_subtree(pids, id)
        assert (sub == numpy.array(subtrees[id])).all()
        verify_subtree(pids, id, sub)

def test_random(n, genfunc, maxsons):
    pids = genfunc(n, maxsons)
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
        verify_subtree(pids, subtree_id, subtree)

def test_timing(n, genfunc, maxsons):
    import time
    pids = genfunc(n, maxsons)
    t = time.time()
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
    t = time.time() - t
    print 't={0}s = {1:.2}ms/subtree = {2:.5}ms/subtree/node '.format(
        t, t / n * 1000, t / n**2 * 1000),

def pytest_generate_tests(metafunc):
    if 'case' in metafunc.function.__name__:
        return
    ns = [1, 2, 3, 4, 5, 10, 20, 50, 100, 1000]
    if 'timing' in metafunc.function.__name__:
        ns += [10000, 100000, 1000000]
        pass
    for n in ns:
        func = _random_graph
        for maxsons in sorted(set([1, 2, 3, 4, 5, 10, (n+1)//2, n])):
            metafunc.addcall(
                funcargs=dict(n=n, genfunc=func, maxsons=maxsons),
                id='n={0} {1.__name__}/{2}'.format(n, func, maxsons))
            if 'linear' in metafunc.function.__name__:
                break

... test_graph_array.py:72: test_timing[n=1000 _random_graph/1] t=13.4850590229s = 13.0ms/subtree = 0.013485ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/2] t=0.318281888962s = 0.32ms/subtree = 0.00031828ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/3] t=0.265519142151s = 0.27ms/subtree = 0.00026552ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/4] t=0.24147105217s = 0.24ms/subtree = 0.00024147ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/5] t=0.211434841156s = 0.21ms/subtree = 0.00021143ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/10] t=0.178458213806s = 0.18ms/subtree = 0.00017846ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/500] t=0.209936141968s = 0.21ms/subtree = 0.00020994ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/1000] t=0.245707988739s = 0.25ms/subtree = 0.00024571ms/subtree/node PASS ...


导入集合
进口numpy
def find_子树（PID，子树id）：
N=len（pids）
assert 1这是我的答案（在没有访问类的情况下编写，因此接口略有不同，但我将按原样附加它，以便您可以测试它是否足够快）：

===============================文件图形\u array.py==========================

import collections
import numpy

def find_subtree(pids, subtree_id):
    N = len(pids)
    assert 1 <= subtree_id <= N

    subtreeids = numpy.zeros(pids.shape, dtype=bool)
    todo = collections.deque([subtree_id])

    iter = 0
    while todo:
        id = todo.popleft()
        assert 1 <= id <= N
        subtreeids[id - 1] = True

        sons = (pids == id).nonzero()[0] + 1
        #print 'id={0} sons={1} todo={2}'.format(id, sons, todo)
        todo.extend(sons)

        iter = iter+1
        if iter>N:
            raise ValueError()

    return subtreeids


import numpy
from graph_array import find_subtree

def _random_graph(n, maxsons):
    import random
    pids = numpy.zeros(n, dtype=int)
    sons = numpy.zeros(n, dtype=int)
    available = []
    for id in xrange(1, n+1):
        if available:
            pid = random.choice(available)

            sons[pid - 1] += 1
            if sons[pid - 1] == maxsons:
                available.remove(pid)
        else:
            pid = -1
        pids[id - 1] = pid
        available.append(id)
    assert sons.max() <= maxsons
    return pids

def verify_subtree(pids, subtree_id, subtree):
    ids = set(subtree.nonzero()[0] + 1)
    sons = set(ids) - set([subtree_id])
    fathers = set(pids[id - 1] for id in sons)
    leafs = set(id for id in ids if not (pids == id).any())
    rest = set(xrange(1, pids.size+1)) - fathers - leafs
    assert fathers & leafs == set()
    assert fathers | leafs == ids
    assert ids & rest == set()

def test_linear_graph_gen(n, genfunc, maxsons):
    assert maxsons == 1
    pids = genfunc(n, maxsons)

    last = -1
    seen = set()
    for _ in xrange(pids.size):
        id = int((pids == last).nonzero()[0]) + 1
        assert id not in seen
        seen.add(id)
        last = id
    assert seen == set(xrange(1, pids.size + 1))

def test_case1():
    """
            1
           / \
          2   4
         /
        3
    """
    pids = numpy.array([-1, 1, 2, 1])

    subtrees = {1: [True, True, True, True],
                2: [False, True, True, False],
                3: [False, False, True, False],
                4: [False, False, False, True]}

    for id in xrange(1, 5):
        sub = find_subtree(pids, id)
        assert (sub == numpy.array(subtrees[id])).all()
        verify_subtree(pids, id, sub)

def test_random(n, genfunc, maxsons):
    pids = genfunc(n, maxsons)
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
        verify_subtree(pids, subtree_id, subtree)

def test_timing(n, genfunc, maxsons):
    import time
    pids = genfunc(n, maxsons)
    t = time.time()
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
    t = time.time() - t
    print 't={0}s = {1:.2}ms/subtree = {2:.5}ms/subtree/node '.format(
        t, t / n * 1000, t / n**2 * 1000),

def pytest_generate_tests(metafunc):
    if 'case' in metafunc.function.__name__:
        return
    ns = [1, 2, 3, 4, 5, 10, 20, 50, 100, 1000]
    if 'timing' in metafunc.function.__name__:
        ns += [10000, 100000, 1000000]
        pass
    for n in ns:
        func = _random_graph
        for maxsons in sorted(set([1, 2, 3, 4, 5, 10, (n+1)//2, n])):
            metafunc.addcall(
                funcargs=dict(n=n, genfunc=func, maxsons=maxsons),
                id='n={0} {1.__name__}/{2}'.format(n, func, maxsons))
            if 'linear' in metafunc.function.__name__:
                break

...
test_graph_array.py:72: test_timing[n=1000 _random_graph/1] t=13.4850590229s = 13.0ms/subtree = 0.013485ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/2] t=0.318281888962s = 0.32ms/subtree = 0.00031828ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/3] t=0.265519142151s = 0.27ms/subtree = 0.00026552ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/4] t=0.24147105217s = 0.24ms/subtree = 0.00024147ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/5] t=0.211434841156s = 0.21ms/subtree = 0.00021143ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/10] t=0.178458213806s = 0.18ms/subtree = 0.00017846ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/500] t=0.209936141968s = 0.21ms/subtree = 0.00020994ms/subtree/node PASS
test_graph_array.py:72: test_timing[n=1000 _random_graph/1000] t=0.245707988739s = 0.25ms/subtree = 0.00024571ms/subtree/node PASS
...

导入集合
进口numpy
def find_子树（PID，子树id）：
N=len（pids）
assert 1我认为伤害您的不是递归本身，而是每个步骤的大量非常广泛的操作（覆盖所有元素）。考虑：
init_vector[np.where(self.ids==newRootElement)[0]] = 1

对所有元素进行扫描，计算每个匹配元素的索引，然后仅使用第一个元素的索引。这个特定的操作可以作为列表、元组和数组的方法索引，而且速度更快。若id是唯一的，那个么init_向量就是IDs==newRootElement
if sum(self.id_successors(newRootElement))==0:

再次对每个元素进行线性扫描，然后对整个数组进行缩减，以检查是否存在匹配项。用于这种类型的操作，但我们甚至不需要对所有元素进行检查——“if newRootElement不在self.parent_id中”执行此操作，但这不是必需的，因为在空列表上执行for循环是完全有效的
最后是最后一个循环：
for sucs in self.ids[self.id_successors(newRootElement)==1]:

这一次，重复一个id_调用，然后不必要地将结果与1进行比较。只有在这之后才会进行递归，确保对每个分支重复上述所有操作（针对不同的newRootElement）
整个代码是单向树的反向遍历。我们有父母，需要孩子。如果我们要做广泛的操作，比如numpy的设计目标，我们最好让它们有价值——因此我们唯一关心的操作就是为每个家长建立一个孩子列表。这在一次迭代中并不难做到：
import collections
children=collections.defaultdict(list)
for i,p in zip(ids,parent_ids):
  children[p].append(i)

def subtree(i):
  return i, map(subtree, children[i])

您需要的确切结构将取决于更多因素，例如树的更改频率、大小、分支数量以及需要请求的子树的大小和数量。例如，上面的dictionary+list结构的内存效率不是很高。您的示例也进行了分类，这可能会使操作更加简单。
我认为，伤害您的不是递归本身，而是每个步骤都有大量非常广泛的操作（覆盖所有元素）。考虑：
init_vector[np.where(self.ids==newRootElement)[0]] = 1

对所有元素进行扫描，计算每个匹配元素的索引，然后仅使用第一个元素的索引。这个特定的操作可以作为列表、元组和数组的方法索引，而且速度更快。若id是唯一的，那个么init_向量就是IDs==newRootElement
if sum(self.id_successors(newRootElement))==0:

再次对每个元素进行线性扫描，然后对整个数组进行缩减，以检查是否存在匹配项。用于这种类型的操作，但再一次，我们甚至不需要