Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在python中,有没有比标准的;递归的;?_Python_Tree_Subtree - Fatal编程技术网

在python中,有没有比标准的;递归的;?

在python中,有没有比标准的;递归的;?,python,tree,subtree,Python,Tree,Subtree,假设以下数据结构包含三个numpy数组(id,parent_id)(根元素的parent_id为-1): 对于许多ID(>1000)而言,这种获取速度非常慢。有没有更快的方法来实现这一点?理论上,每个算法都可以迭代编写,也可以递归编写。但这是一个谬论(就像图灵完备性一样)。在实践中,通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方(至少你正在修改子树)。在数千个元素上执行x从本质上讲是非常昂贵的,不管您是迭代还是递归地执行。在具体的实现上,最多可能有一些微优化,这在理论上最多

假设以下数据结构包含三个numpy数组(id,parent_id)(根元素的parent_id为-1):


对于许多ID(>1000)而言,这种获取速度非常慢。有没有更快的方法来实现这一点?

理论上,每个算法都可以迭代编写,也可以递归编写。但这是一个谬论(就像图灵完备性一样)。在实践中,通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方(至少你正在修改子树)。在数千个元素上执行x从本质上讲是非常昂贵的,不管您是迭代还是递归地执行。在具体的实现上,最多可能有一些微优化,这在理论上最多会产生,每个算法都可以迭代编写,也可以递归编写。但这是一个谬论(就像图灵完备性一样)。在实践中,通过迭代遍历任意嵌套的树通常是不可行的。我怀疑还有很多需要优化的地方(至少你正在修改子树)。在数千个元素上执行x从本质上讲是非常昂贵的,不管您是迭代还是递归地执行。在具体的实现上,最多可以进行一些微优化,这将最多产生如果您使用的是Python2.6,您是否尝试过使用psyco模块?它有时可以大大加快代码的速度

您是否考虑过递归数据结构:列表

您的示例也是标准列表:

[1、[3、[4]、[5]]

[1,[2,无,无],[3,[4,无,无],[5,无,无]]

本人谨此陈辞:

子树已经准备好了,需要花费一些时间将值插入到正确的树中。也值得一试,看看是否适合你的需要

此外,Guido自己也提供了一些关于遍历和中的树的见解,也许您已经意识到了这一点


下面是一些高级的树元素,实际上是为Python提出的基本列表类型替换,但在该函数中被拒绝

如果您使用的是Python 2.6,您是否尝试过使用psyco模块?它有时可以大大加快代码的速度

您是否考虑过递归数据结构:列表

您的示例也是标准列表:

[1、[3、[4]、[5]]

[1,[2,无,无],[3,[4,无,无],[5,无,无]]

本人谨此陈辞:

子树已经准备好了,需要花费一些时间将值插入到正确的树中。也值得一试,看看是否适合你的需要

此外,Guido自己也提供了一些关于遍历和中的树的见解,也许您已经意识到了这一点

下面是一些高级的树元素,实际上是为Python提出的基本列表类型替换,但在该函数中被拒绝

这是我的答案(在没有访问类的情况下编写,因此接口稍有不同,但我将按原样附加它,以便您可以测试它是否足够快):
===============================文件图形\u array.py==========================


import collections
import numpy

def find_subtree(pids, subtree_id):
    N = len(pids)
    assert 1 <= subtree_id <= N

    subtreeids = numpy.zeros(pids.shape, dtype=bool)
    todo = collections.deque([subtree_id])

    iter = 0
    while todo:
        id = todo.popleft()
        assert 1 <= id <= N
        subtreeids[id - 1] = True

        sons = (pids == id).nonzero()[0] + 1
        #print 'id={0} sons={1} todo={2}'.format(id, sons, todo)
        todo.extend(sons)

        iter = iter+1
        if iter>N:
            raise ValueError()

    return subtreeids

import numpy
from graph_array import find_subtree

def _random_graph(n, maxsons):
    import random
    pids = numpy.zeros(n, dtype=int)
    sons = numpy.zeros(n, dtype=int)
    available = []
    for id in xrange(1, n+1):
        if available:
            pid = random.choice(available)

            sons[pid - 1] += 1
            if sons[pid - 1] == maxsons:
                available.remove(pid)
        else:
            pid = -1
        pids[id - 1] = pid
        available.append(id)
    assert sons.max() <= maxsons
    return pids

def verify_subtree(pids, subtree_id, subtree):
    ids = set(subtree.nonzero()[0] + 1)
    sons = set(ids) - set([subtree_id])
    fathers = set(pids[id - 1] for id in sons)
    leafs = set(id for id in ids if not (pids == id).any())
    rest = set(xrange(1, pids.size+1)) - fathers - leafs
    assert fathers & leafs == set()
    assert fathers | leafs == ids
    assert ids & rest == set()

def test_linear_graph_gen(n, genfunc, maxsons):
    assert maxsons == 1
    pids = genfunc(n, maxsons)

    last = -1
    seen = set()
    for _ in xrange(pids.size):
        id = int((pids == last).nonzero()[0]) + 1
        assert id not in seen
        seen.add(id)
        last = id
    assert seen == set(xrange(1, pids.size + 1))

def test_case1():
    """
            1
           / \
          2   4
         /
        3
    """
    pids = numpy.array([-1, 1, 2, 1])

    subtrees = {1: [True, True, True, True],
                2: [False, True, True, False],
                3: [False, False, True, False],
                4: [False, False, False, True]}

    for id in xrange(1, 5):
        sub = find_subtree(pids, id)
        assert (sub == numpy.array(subtrees[id])).all()
        verify_subtree(pids, id, sub)

def test_random(n, genfunc, maxsons):
    pids = genfunc(n, maxsons)
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
        verify_subtree(pids, subtree_id, subtree)

def test_timing(n, genfunc, maxsons):
    import time
    pids = genfunc(n, maxsons)
    t = time.time()
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
    t = time.time() - t
    print 't={0}s = {1:.2}ms/subtree = {2:.5}ms/subtree/node '.format(
        t, t / n * 1000, t / n**2 * 1000),

def pytest_generate_tests(metafunc):
    if 'case' in metafunc.function.__name__:
        return
    ns = [1, 2, 3, 4, 5, 10, 20, 50, 100, 1000]
    if 'timing' in metafunc.function.__name__:
        ns += [10000, 100000, 1000000]
        pass
    for n in ns:
        func = _random_graph
        for maxsons in sorted(set([1, 2, 3, 4, 5, 10, (n+1)//2, n])):
            metafunc.addcall(
                funcargs=dict(n=n, genfunc=func, maxsons=maxsons),
                id='n={0} {1.__name__}/{2}'.format(n, func, maxsons))
            if 'linear' in metafunc.function.__name__:
                break
... test_graph_array.py:72: test_timing[n=1000 _random_graph/1] t=13.4850590229s = 13.0ms/subtree = 0.013485ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/2] t=0.318281888962s = 0.32ms/subtree = 0.00031828ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/3] t=0.265519142151s = 0.27ms/subtree = 0.00026552ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/4] t=0.24147105217s = 0.24ms/subtree = 0.00024147ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/5] t=0.211434841156s = 0.21ms/subtree = 0.00021143ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/10] t=0.178458213806s = 0.18ms/subtree = 0.00017846ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/500] t=0.209936141968s = 0.21ms/subtree = 0.00020994ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/1000] t=0.245707988739s = 0.25ms/subtree = 0.00024571ms/subtree/node PASS ...

导入集合
进口numpy
def find_子树(PID,子树id):
N=len(pids)
assert 1这是我的答案(在没有访问类的情况下编写,因此接口略有不同,但我将按原样附加它,以便您可以测试它是否足够快):
===============================文件图形\u array.py==========================


import collections
import numpy

def find_subtree(pids, subtree_id):
    N = len(pids)
    assert 1 <= subtree_id <= N

    subtreeids = numpy.zeros(pids.shape, dtype=bool)
    todo = collections.deque([subtree_id])

    iter = 0
    while todo:
        id = todo.popleft()
        assert 1 <= id <= N
        subtreeids[id - 1] = True

        sons = (pids == id).nonzero()[0] + 1
        #print 'id={0} sons={1} todo={2}'.format(id, sons, todo)
        todo.extend(sons)

        iter = iter+1
        if iter>N:
            raise ValueError()

    return subtreeids

import numpy
from graph_array import find_subtree

def _random_graph(n, maxsons):
    import random
    pids = numpy.zeros(n, dtype=int)
    sons = numpy.zeros(n, dtype=int)
    available = []
    for id in xrange(1, n+1):
        if available:
            pid = random.choice(available)

            sons[pid - 1] += 1
            if sons[pid - 1] == maxsons:
                available.remove(pid)
        else:
            pid = -1
        pids[id - 1] = pid
        available.append(id)
    assert sons.max() <= maxsons
    return pids

def verify_subtree(pids, subtree_id, subtree):
    ids = set(subtree.nonzero()[0] + 1)
    sons = set(ids) - set([subtree_id])
    fathers = set(pids[id - 1] for id in sons)
    leafs = set(id for id in ids if not (pids == id).any())
    rest = set(xrange(1, pids.size+1)) - fathers - leafs
    assert fathers & leafs == set()
    assert fathers | leafs == ids
    assert ids & rest == set()

def test_linear_graph_gen(n, genfunc, maxsons):
    assert maxsons == 1
    pids = genfunc(n, maxsons)

    last = -1
    seen = set()
    for _ in xrange(pids.size):
        id = int((pids == last).nonzero()[0]) + 1
        assert id not in seen
        seen.add(id)
        last = id
    assert seen == set(xrange(1, pids.size + 1))

def test_case1():
    """
            1
           / \
          2   4
         /
        3
    """
    pids = numpy.array([-1, 1, 2, 1])

    subtrees = {1: [True, True, True, True],
                2: [False, True, True, False],
                3: [False, False, True, False],
                4: [False, False, False, True]}

    for id in xrange(1, 5):
        sub = find_subtree(pids, id)
        assert (sub == numpy.array(subtrees[id])).all()
        verify_subtree(pids, id, sub)

def test_random(n, genfunc, maxsons):
    pids = genfunc(n, maxsons)
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
        verify_subtree(pids, subtree_id, subtree)

def test_timing(n, genfunc, maxsons):
    import time
    pids = genfunc(n, maxsons)
    t = time.time()
    for subtree_id in numpy.arange(1, n+1):
        subtree = find_subtree(pids, subtree_id)
    t = time.time() - t
    print 't={0}s = {1:.2}ms/subtree = {2:.5}ms/subtree/node '.format(
        t, t / n * 1000, t / n**2 * 1000),

def pytest_generate_tests(metafunc):
    if 'case' in metafunc.function.__name__:
        return
    ns = [1, 2, 3, 4, 5, 10, 20, 50, 100, 1000]
    if 'timing' in metafunc.function.__name__:
        ns += [10000, 100000, 1000000]
        pass
    for n in ns:
        func = _random_graph
        for maxsons in sorted(set([1, 2, 3, 4, 5, 10, (n+1)//2, n])):
            metafunc.addcall(
                funcargs=dict(n=n, genfunc=func, maxsons=maxsons),
                id='n={0} {1.__name__}/{2}'.format(n, func, maxsons))
            if 'linear' in metafunc.function.__name__:
                break
... test_graph_array.py:72: test_timing[n=1000 _random_graph/1] t=13.4850590229s = 13.0ms/subtree = 0.013485ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/2] t=0.318281888962s = 0.32ms/subtree = 0.00031828ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/3] t=0.265519142151s = 0.27ms/subtree = 0.00026552ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/4] t=0.24147105217s = 0.24ms/subtree = 0.00024147ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/5] t=0.211434841156s = 0.21ms/subtree = 0.00021143ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/10] t=0.178458213806s = 0.18ms/subtree = 0.00017846ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/500] t=0.209936141968s = 0.21ms/subtree = 0.00020994ms/subtree/node PASS test_graph_array.py:72: test_timing[n=1000 _random_graph/1000] t=0.245707988739s = 0.25ms/subtree = 0.00024571ms/subtree/node PASS ...

导入集合
进口numpy
def find_子树(PID,子树id):
N=len(pids)

assert 1我认为伤害您的不是递归本身,而是每个步骤的大量非常广泛的操作(覆盖所有元素)。考虑:

init_vector[np.where(self.ids==newRootElement)[0]] = 1
对所有元素进行扫描,计算每个匹配元素的索引,然后仅使用第一个元素的索引。这个特定的操作可以作为列表、元组和数组的方法索引,而且速度更快。若id是唯一的,那个么init_向量就是IDs==newRootElement

if sum(self.id_successors(newRootElement))==0:
再次对每个元素进行线性扫描,然后对整个数组进行缩减,以检查是否存在匹配项。用于这种类型的操作,但我们甚至不需要对所有元素进行检查——“if newRootElement不在self.parent_id中”执行此操作,但这不是必需的,因为在空列表上执行for循环是完全有效的

最后是最后一个循环:

for sucs in self.ids[self.id_successors(newRootElement)==1]:
这一次,重复一个id_调用,然后不必要地将结果与1进行比较。只有在这之后才会进行递归,确保对每个分支重复上述所有操作(针对不同的newRootElement)

整个代码是单向树的反向遍历。我们有父母,需要孩子。如果我们要做广泛的操作,比如numpy的设计目标,我们最好让它们有价值——因此我们唯一关心的操作就是为每个家长建立一个孩子列表。这在一次迭代中并不难做到:

import collections
children=collections.defaultdict(list)
for i,p in zip(ids,parent_ids):
  children[p].append(i)

def subtree(i):
  return i, map(subtree, children[i])

您需要的确切结构将取决于更多因素,例如树的更改频率、大小、分支数量以及需要请求的子树的大小和数量。例如,上面的dictionary+list结构的内存效率不是很高。您的示例也进行了分类,这可能会使操作更加简单。

我认为,伤害您的不是递归本身,而是每个步骤都有大量非常广泛的操作(覆盖所有元素)。考虑:

init_vector[np.where(self.ids==newRootElement)[0]] = 1
对所有元素进行扫描,计算每个匹配元素的索引,然后仅使用第一个元素的索引。这个特定的操作可以作为列表、元组和数组的方法索引,而且速度更快。若id是唯一的,那个么init_向量就是IDs==newRootElement

if sum(self.id_successors(newRootElement))==0:
再次对每个元素进行线性扫描,然后对整个数组进行缩减,以检查是否存在匹配项。用于这种类型的操作,但再一次,我们甚至不需要