Python 反转索引列表_Python_List

Python 反转索引列表

python list

Python 反转索引列表,python,list,Python,List,我有一个索引列表，例如 a = [ [2], [0, 1, 3, 2], [1], [0, 3] ] 我现在想“反转”这个列表：数字0出现在索引1和3中，因此： b = [ [1, 3], [1, 2], [0, 1], [1, 3] ] 有没有关于如何快速完成的提示？（我正在处理的列表可能很大。）好处：我知道每个索引在a中正好出现两次（就像上面的例子一样）。使用字典收集反向索引，使用枚举（）为a条目生成索引：

我有一个索引列表，例如

a = [
    [2],
    [0, 1, 3, 2],
    [1],
    [0, 3]
    ]

我现在想“反转”这个列表：数字

出现在索引

和

中，因此：

b = [
    [1, 3],
    [1, 2],
    [0, 1],
    [1, 3]
    ]

有没有关于如何快速完成的提示？（我正在处理的列表可能很大。）

好处：我知道每个索引在

中正好出现两次（就像上面的例子一样）。

使用字典收集反向索引，使用

枚举（）

为

条目生成索引：

inverted = {}
for index, numbers in enumerate(a):
    for number in numbers:
        inverted.setdefault(number, []).append(index)

b = [inverted.get(i, []) for i in range(max(inverted) + 1)]

字典为您提供了有效的随机访问来添加反转，但这确实意味着您需要考虑反转中可能缺少的索引，因此

范围（max（inversed））

循环以确保覆盖0和最大值之间的所有索引

演示：

使用字典将反向索引收集到，使用

enumerate（）

为

条目生成索引：

inverted = {}
for index, numbers in enumerate(a):
    for number in numbers:
        inverted.setdefault(number, []).append(index)

b = [inverted.get(i, []) for i in range(max(inverted) + 1)]

字典为您提供了有效的随机访问来添加反转，但这确实意味着您需要考虑反转中可能缺少的索引，因此

范围（max（inversed））

循环以确保覆盖0和最大值之间的所有索引

演示：

假设每个索引只出现两次，则以下代码有效：

from itertools import chain

a = [[2],
     [0, 1, 3, 2],
     [1],
     [0, 3]]

b = (max(chain(*a)) + 1) * [None]

for i, lst in enumerate(a):
    for j in lst:
        if not b[j]:
            b[j] = [i, None]
        else:
            b[j][1] = i

正如@smarx所指出的，如果我们进一步假设

len（a）

表示值的范围，如示例中所示，上述解决方案可以简化为：

a = [[2],
     [0, 1, 3, 2],
     [1],
     [0, 3]]

b = len(a) * [[None]]

for i, lst in enumerate(a):
    for j in lst:
        if not b[j]:
            b[j] = [i, None]
        else:
            b[j][1] = i

编辑： 解决方案比较。

对于大型数组，使用

append

不是最佳选择，因为它会重新分配内存。因此，在数组

上循环两次可能会更快

为了测试它，我创建了一个函数

gen_list

，它根据问题的假设生成一个列表。守则如下：

# This answer's solution
def solution1(a):
    from itertools import chain

    b = (max(chain(*a)) + 1)* [None]

    for i, lst in enumerate(a):
        for j in lst:
            if not b[j]:
                b[j] = [i, None]
            else:
                b[j][1] = i

    return b


# smarx's solution
def solution2(a):
    b = []

    for i, nums in enumerate(a):

        # For each number found at this index
        for num in nums:

            # If needed, extend b to cover the new needed range
            for _ in range(num + 1 - len(b)):
                b.append([])

            # Store the index
            b[num].append(i)

    return b


# Martijn Pieters's solution
def solution3(a):
    inverted = {}
    for index, numbers in enumerate(a):
        for number in numbers:
            inverted.setdefault(number, []).append(index)

    return [inverted.get(i, []) for i in range(max(inverted) + 1)]


# eugene y's solution
def solution4(a):
    b = []    
    for i, lst in enumerate(a):
        for j in lst:
            if j >= len(b):
                b += [[] for _ in range(j - len(b) + 1)]
            b[j].append(i)


def gen_list(n):
    from numpy.random import choice
    lst = []
    for _ in range(n):
        lst.append([])
    for i in range(n):
        lst[choice(n)].append(i)
        lst[choice(n)].append(i)
    return lst

然后，测试溶液的速度得出：

In [1]: a = gen_list(10)

In [2]: %timeit solution1(a)
The slowest run took 8.68 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 9.45 µs per loop

In [3]: %timeit solution2(a)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 14.5 µs per loop

In [4]: %timeit solution3(a)
100000 loops, best of 3: 12.2 µs per loop

In [5]: %timeit solution4(a)
The slowest run took 5.69 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 10.3 µs per loop

In [6]: a = gen_list(100)

In [7]: %timeit solution1(a)
10000 loops, best of 3: 70.5 µs per loop

In [8]: %timeit solution2(a)
10000 loops, best of 3: 135 µs per loop

In [9]: %timeit solution3(a)
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 115 µs per loop

In [10]: %timeit solution4(a)
The slowest run took 6.75 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 76.6 µs per loop

假设每个索引只出现两次，则以下代码有效：

from itertools import chain

a = [[2],
     [0, 1, 3, 2],
     [1],
     [0, 3]]

b = (max(chain(*a)) + 1) * [None]

for i, lst in enumerate(a):
    for j in lst:
        if not b[j]:
            b[j] = [i, None]
        else:
            b[j][1] = i

正如@smarx所指出的，如果我们进一步假设

len（a）

表示值的范围，如示例中所示，上述解决方案可以简化为：

a = [[2],
     [0, 1, 3, 2],
     [1],
     [0, 3]]

b = len(a) * [[None]]

for i, lst in enumerate(a):
    for j in lst:
        if not b[j]:
            b[j] = [i, None]
        else:
            b[j][1] = i

编辑： 解决方案比较。

对于大型数组，使用

append

不是最佳选择，因为它会重新分配内存。因此，在数组

上循环两次可能会更快

为了测试它，我创建了一个函数

gen_list

，它根据问题的假设生成一个列表。守则如下：

# This answer's solution
def solution1(a):
    from itertools import chain

    b = (max(chain(*a)) + 1)* [None]

    for i, lst in enumerate(a):
        for j in lst:
            if not b[j]:
                b[j] = [i, None]
            else:
                b[j][1] = i

    return b


# smarx's solution
def solution2(a):
    b = []

    for i, nums in enumerate(a):

        # For each number found at this index
        for num in nums:

            # If needed, extend b to cover the new needed range
            for _ in range(num + 1 - len(b)):
                b.append([])

            # Store the index
            b[num].append(i)

    return b


# Martijn Pieters's solution
def solution3(a):
    inverted = {}
    for index, numbers in enumerate(a):
        for number in numbers:
            inverted.setdefault(number, []).append(index)

    return [inverted.get(i, []) for i in range(max(inverted) + 1)]


# eugene y's solution
def solution4(a):
    b = []    
    for i, lst in enumerate(a):
        for j in lst:
            if j >= len(b):
                b += [[] for _ in range(j - len(b) + 1)]
            b[j].append(i)


def gen_list(n):
    from numpy.random import choice
    lst = []
    for _ in range(n):
        lst.append([])
    for i in range(n):
        lst[choice(n)].append(i)
        lst[choice(n)].append(i)
    return lst

然后，测试溶液的速度得出：

In [1]: a = gen_list(10)

In [2]: %timeit solution1(a)
The slowest run took 8.68 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 9.45 µs per loop

In [3]: %timeit solution2(a)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 14.5 µs per loop

In [4]: %timeit solution3(a)
100000 loops, best of 3: 12.2 µs per loop

In [5]: %timeit solution4(a)
The slowest run took 5.69 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 10.3 µs per loop

In [6]: a = gen_list(100)

In [7]: %timeit solution1(a)
10000 loops, best of 3: 70.5 µs per loop

In [8]: %timeit solution2(a)
10000 loops, best of 3: 135 µs per loop

In [9]: %timeit solution3(a)
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 115 µs per loop

In [10]: %timeit solution4(a)
The slowest run took 6.75 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 76.6 µs per loop

此代码不依赖于每个数字正好出现两次这一事实。它也非常简单，避免了构建字典然后从中复制结果的开销：

a = [
        [2],
        [0, 1, 3, 2],
        [1],
        [0, 3]
    ]

b = []

for i, nums in enumerate(a):

    # For each number found at this index
    for num in nums:


        # If needed, extend b to cover the new needed range
        b += [[] for _ in range(num + 1 - len(b)]

        # Store the index
        b[num].append(i)

print(b)

# Output:
# [[1, 3], [1, 2], [0, 1], [1, 3]]

此代码不依赖于每个数字正好出现两次这一事实。它也非常简单，避免了构建字典然后从中复制结果的开销：

a = [
        [2],
        [0, 1, 3, 2],
        [1],
        [0, 3]
    ]

b = []

for i, nums in enumerate(a):

    # For each number found at this index
    for num in nums:


        # If needed, extend b to cover the new needed range
        b += [[] for _ in range(num + 1 - len(b)]

        # Store the index
        b[num].append(i)

print(b)

# Output:
# [[1, 3], [1, 2], [0, 1], [1, 3]]

这里有一个非常简单的

O（n）

解决方案，它只使用列表，还使用：

不依赖于每个索引在
```
a
```
中出现两次这一事实
不假设
```
a
```
中的值范围

这里有一个非常简单的

O（n）

解决方案，它只使用列表，还使用：

不依赖于每个索引在
```
a
```
中出现两次这一事实
不假设
```
a
```
中的值范围

这应该起作用：

import itertools
b = [[] for _ in range(1 + max(itertools.chain.from_iterable(a)))]
for i, lst in enumerate(a):
    for j in lst:
        if i not in b[j]:
            b[j].append(i)

请注意，上面的代码并不假定可以出现在

中的值的范围是

range（len（a））

。为了避免在

的子列表中重复出现值，我在追加之前检查：

如果我不在b[j]：

这应该可以：

import itertools
b = [[] for _ in range(1 + max(itertools.chain.from_iterable(a)))]
for i, lst in enumerate(a):
    for j in lst:
        if i not in b[j]:
            b[j].append(i)

>>> a = [[2], [0, 1, 3, 2], [1], [0, 3]]
>>> b = [[] for _ in range(sum(map(len, a)) / 2)]
>>> for u, edges in enumerate(a):
        for edge in edges:
            b[edge].append(u)

>>> b
[[1, 3], [1, 2], [0, 1], [1, 3]]

请注意，上面的代码并不假定可以出现在

中的值的范围是

range（len（a））

。为了避免在

的子列表中重复出现值，我在添加之前检查：

如果我不在b[j]：

基本上与答案相同。除此之外，此项会在执行过程中删除原始数组中的项，从而使算法在内存上更高效（取决于垃圾收集器的实现方式）

>>> a = [[2], [0, 1, 3, 2], [1], [0, 3]]
>>> b = [[] for _ in range(sum(map(len, a)) / 2)]
>>> for u, edges in enumerate(a):
        for edge in edges:
            b[edge].append(u)

>>> b
[[1, 3], [1, 2], [0, 1], [1, 3]]

注意，内部列表的顺序是相反的。

基本上与答案相同。除此之外，此项会在执行过程中删除原始数组中的项，从而使算法在内存上更高效（取决于垃圾收集器的实现方式）

注意，内部列表的顺序是颠倒的。

在这种情况下，是否有特定的原因不使用

defaultdict

？@Fawful:simplicity

dict.setdefault（）

同样有效，不需要导入，并且允许程序员控制何时添加新的键。由于@Nico特别询问效率，我想指出，在建立字典后，此代码会产生创建包含所有数据的新列表的开销。根据对列表的处理方式，直接从字典中枚举值而不是将它们复制到列表中可能会更有效。在这种情况下不使用

defaultdict

有什么具体原因吗？@Fawful:simplicity

dict.setdefault（）

同样有效，不需要导入，并且允许程序员控制何时添加新的键。由于@Nico特别询问效率，我想指出，在建立字典后，此代码会产生创建包含所有数据的新列表的开销。根据列表的处理方式，直接从字典中枚举值而不是将其复制到列表中可能更有效。此代码假定

len（a）

也是

中可能出现的值的范围。例如，i