Python 反转索引列表
我有一个索引列表,例如Python 反转索引列表,python,list,Python,List,我有一个索引列表,例如 a = [ [2], [0, 1, 3, 2], [1], [0, 3] ] 我现在想“反转”这个列表:数字0出现在索引1和3中,因此: b = [ [1, 3], [1, 2], [0, 1], [1, 3] ] 有没有关于如何快速完成的提示?(我正在处理的列表可能很大。) 好处:我知道每个索引在a中正好出现两次(就像上面的例子一样)。使用字典收集反向索引,使用枚举()为a条目生成索引:
a = [
[2],
[0, 1, 3, 2],
[1],
[0, 3]
]
我现在想“反转”这个列表:数字0
出现在索引1
和3
中,因此:
b = [
[1, 3],
[1, 2],
[0, 1],
[1, 3]
]
有没有关于如何快速完成的提示?(我正在处理的列表可能很大。)
好处:我知道每个索引在
a
中正好出现两次(就像上面的例子一样)。使用字典收集反向索引,使用枚举()
为a
条目生成索引:
inverted = {}
for index, numbers in enumerate(a):
for number in numbers:
inverted.setdefault(number, []).append(index)
b = [inverted.get(i, []) for i in range(max(inverted) + 1)]
字典为您提供了有效的随机访问来添加反转,但这确实意味着您需要考虑反转中可能缺少的索引,因此范围(max(inversed))
循环以确保覆盖0和最大值之间的所有索引
演示:
使用字典将反向索引收集到,使用
enumerate()
为a
条目生成索引:
inverted = {}
for index, numbers in enumerate(a):
for number in numbers:
inverted.setdefault(number, []).append(index)
b = [inverted.get(i, []) for i in range(max(inverted) + 1)]
字典为您提供了有效的随机访问来添加反转,但这确实意味着您需要考虑反转中可能缺少的索引,因此范围(max(inversed))
循环以确保覆盖0和最大值之间的所有索引
演示:
假设每个索引只出现两次,则以下代码有效:
from itertools import chain
a = [[2],
[0, 1, 3, 2],
[1],
[0, 3]]
b = (max(chain(*a)) + 1) * [None]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
正如@smarx所指出的,如果我们进一步假设len(a)
表示值的范围,如示例中所示,上述解决方案可以简化为:
a = [[2],
[0, 1, 3, 2],
[1],
[0, 3]]
b = len(a) * [[None]]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
编辑:
解决方案比较。
对于大型数组,使用append
不是最佳选择,因为它会重新分配内存。因此,在数组a
上循环两次可能会更快
为了测试它,我创建了一个函数gen_list
,它根据问题的假设生成一个列表。守则如下:
# This answer's solution
def solution1(a):
from itertools import chain
b = (max(chain(*a)) + 1)* [None]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
return b
# smarx's solution
def solution2(a):
b = []
for i, nums in enumerate(a):
# For each number found at this index
for num in nums:
# If needed, extend b to cover the new needed range
for _ in range(num + 1 - len(b)):
b.append([])
# Store the index
b[num].append(i)
return b
# Martijn Pieters's solution
def solution3(a):
inverted = {}
for index, numbers in enumerate(a):
for number in numbers:
inverted.setdefault(number, []).append(index)
return [inverted.get(i, []) for i in range(max(inverted) + 1)]
# eugene y's solution
def solution4(a):
b = []
for i, lst in enumerate(a):
for j in lst:
if j >= len(b):
b += [[] for _ in range(j - len(b) + 1)]
b[j].append(i)
def gen_list(n):
from numpy.random import choice
lst = []
for _ in range(n):
lst.append([])
for i in range(n):
lst[choice(n)].append(i)
lst[choice(n)].append(i)
return lst
然后,测试溶液的速度得出:
In [1]: a = gen_list(10)
In [2]: %timeit solution1(a)
The slowest run took 8.68 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.45 µs per loop
In [3]: %timeit solution2(a)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 14.5 µs per loop
In [4]: %timeit solution3(a)
100000 loops, best of 3: 12.2 µs per loop
In [5]: %timeit solution4(a)
The slowest run took 5.69 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 10.3 µs per loop
In [6]: a = gen_list(100)
In [7]: %timeit solution1(a)
10000 loops, best of 3: 70.5 µs per loop
In [8]: %timeit solution2(a)
10000 loops, best of 3: 135 µs per loop
In [9]: %timeit solution3(a)
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 115 µs per loop
In [10]: %timeit solution4(a)
The slowest run took 6.75 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 76.6 µs per loop
假设每个索引只出现两次,则以下代码有效:
from itertools import chain
a = [[2],
[0, 1, 3, 2],
[1],
[0, 3]]
b = (max(chain(*a)) + 1) * [None]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
正如@smarx所指出的,如果我们进一步假设len(a)
表示值的范围,如示例中所示,上述解决方案可以简化为:
a = [[2],
[0, 1, 3, 2],
[1],
[0, 3]]
b = len(a) * [[None]]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
编辑:
解决方案比较。
对于大型数组,使用append
不是最佳选择,因为它会重新分配内存。因此,在数组a
上循环两次可能会更快
为了测试它,我创建了一个函数gen_list
,它根据问题的假设生成一个列表。守则如下:
# This answer's solution
def solution1(a):
from itertools import chain
b = (max(chain(*a)) + 1)* [None]
for i, lst in enumerate(a):
for j in lst:
if not b[j]:
b[j] = [i, None]
else:
b[j][1] = i
return b
# smarx's solution
def solution2(a):
b = []
for i, nums in enumerate(a):
# For each number found at this index
for num in nums:
# If needed, extend b to cover the new needed range
for _ in range(num + 1 - len(b)):
b.append([])
# Store the index
b[num].append(i)
return b
# Martijn Pieters's solution
def solution3(a):
inverted = {}
for index, numbers in enumerate(a):
for number in numbers:
inverted.setdefault(number, []).append(index)
return [inverted.get(i, []) for i in range(max(inverted) + 1)]
# eugene y's solution
def solution4(a):
b = []
for i, lst in enumerate(a):
for j in lst:
if j >= len(b):
b += [[] for _ in range(j - len(b) + 1)]
b[j].append(i)
def gen_list(n):
from numpy.random import choice
lst = []
for _ in range(n):
lst.append([])
for i in range(n):
lst[choice(n)].append(i)
lst[choice(n)].append(i)
return lst
然后,测试溶液的速度得出:
In [1]: a = gen_list(10)
In [2]: %timeit solution1(a)
The slowest run took 8.68 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 9.45 µs per loop
In [3]: %timeit solution2(a)
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 14.5 µs per loop
In [4]: %timeit solution3(a)
100000 loops, best of 3: 12.2 µs per loop
In [5]: %timeit solution4(a)
The slowest run took 5.69 times longer than the fastest. This could mean that an intermediate result is being cached
100000 loops, best of 3: 10.3 µs per loop
In [6]: a = gen_list(100)
In [7]: %timeit solution1(a)
10000 loops, best of 3: 70.5 µs per loop
In [8]: %timeit solution2(a)
10000 loops, best of 3: 135 µs per loop
In [9]: %timeit solution3(a)
The slowest run took 5.28 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 115 µs per loop
In [10]: %timeit solution4(a)
The slowest run took 6.75 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 76.6 µs per loop
此代码不依赖于每个数字正好出现两次这一事实。它也非常简单,避免了构建字典然后从中复制结果的开销:
a = [
[2],
[0, 1, 3, 2],
[1],
[0, 3]
]
b = []
for i, nums in enumerate(a):
# For each number found at this index
for num in nums:
# If needed, extend b to cover the new needed range
b += [[] for _ in range(num + 1 - len(b)]
# Store the index
b[num].append(i)
print(b)
# Output:
# [[1, 3], [1, 2], [0, 1], [1, 3]]
此代码不依赖于每个数字正好出现两次这一事实。它也非常简单,避免了构建字典然后从中复制结果的开销:
a = [
[2],
[0, 1, 3, 2],
[1],
[0, 3]
]
b = []
for i, nums in enumerate(a):
# For each number found at this index
for num in nums:
# If needed, extend b to cover the new needed range
b += [[] for _ in range(num + 1 - len(b)]
# Store the index
b[num].append(i)
print(b)
# Output:
# [[1, 3], [1, 2], [0, 1], [1, 3]]
这里有一个非常简单的
O(n)
解决方案,它只使用列表,还使用:
- 不依赖于每个索引在
中出现两次这一事实a
- 不假设
中的值范围a
这里有一个非常简单的
O(n)
解决方案,它只使用列表,还使用:
- 不依赖于每个索引在
中出现两次这一事实a
- 不假设
中的值范围a
import itertools
b = [[] for _ in range(1 + max(itertools.chain.from_iterable(a)))]
for i, lst in enumerate(a):
for j in lst:
if i not in b[j]:
b[j].append(i)
请注意,上面的代码并不假定可以出现在a
中的值的范围是range(len(a))
。为了避免在b
的子列表中重复出现值,我在追加之前检查:如果我不在b[j]:
这应该可以:
import itertools
b = [[] for _ in range(1 + max(itertools.chain.from_iterable(a)))]
for i, lst in enumerate(a):
for j in lst:
if i not in b[j]:
b[j].append(i)
>>> a = [[2], [0, 1, 3, 2], [1], [0, 3]]
>>> b = [[] for _ in range(sum(map(len, a)) / 2)]
>>> for u, edges in enumerate(a):
for edge in edges:
b[edge].append(u)
>>> b
[[1, 3], [1, 2], [0, 1], [1, 3]]
请注意,上面的代码并不假定可以出现在a
中的值的范围是range(len(a))
。为了避免在b
的子列表中重复出现值,我在添加之前检查:如果我不在b[j]:
基本上与答案相同。除此之外,此项会在执行过程中删除原始数组中的项,从而使算法在内存上更高效(取决于垃圾收集器的实现方式)
>>> a = [[2], [0, 1, 3, 2], [1], [0, 3]]
>>> b = [[] for _ in range(sum(map(len, a)) / 2)]
>>> for u, edges in enumerate(a):
for edge in edges:
b[edge].append(u)
>>> b
[[1, 3], [1, 2], [0, 1], [1, 3]]
注意,内部列表的顺序是相反的。基本上与答案相同。除此之外,此项会在执行过程中删除原始数组中的项,从而使算法在内存上更高效(取决于垃圾收集器的实现方式)
注意,内部列表的顺序是颠倒的。在这种情况下,是否有特定的原因不使用
defaultdict
?@Fawful:simplicitydict.setdefault()
同样有效,不需要导入,并且允许程序员控制何时添加新的键。由于@Nico特别询问效率,我想指出,在建立字典后,此代码会产生创建包含所有数据的新列表的开销。根据对列表的处理方式,直接从字典中枚举值而不是将它们复制到列表中可能会更有效。在这种情况下不使用defaultdict
有什么具体原因吗?@Fawful:simplicitydict.setdefault()
同样有效,不需要导入,并且允许程序员控制何时添加新的键。由于@Nico特别询问效率,我想指出,在建立字典后,此代码会产生创建包含所有数据的新列表的开销。根据列表的处理方式,直接从字典中枚举值而不是将其复制到列表中可能更有效。此代码假定len(a)
也是a
中可能出现的值的范围。例如,i