Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 获取无序列表的索引_Python_Sorting - Fatal编程技术网

Python 获取无序列表的索引

Python 获取无序列表的索引,python,sorting,Python,Sorting,我有两个列表(实际上是两个dataframe列)。它们有相同的元素,但有一个列表是无序的。我想得到与有序列表对应的无序列表的索引。有没有一个简单的方法可以做到这一点 即列表1[索引]==列表2 我需要获取索引变量。在列表中使用: l1 = ['a','b','c','d'] l2 = ['c','d','b','a'] [l1.index(x) for x in l2] #[2, 3, 1, 0] 如果您试图在数据帧中执行此操作,则可以将np.array转换为list,然后再转换回来,如下所

我有两个列表(实际上是两个dataframe列)。它们有相同的元素,但有一个列表是无序的。我想得到与有序列表对应的无序列表的索引。有没有一个简单的方法可以做到这一点

即列表1[索引]==列表2

我需要获取索引变量。

在列表中使用:

l1 = ['a','b','c','d']
l2 = ['c','d','b','a']

[l1.index(x) for x in l2] #[2, 3, 1, 0]
如果您试图在
数据帧中执行此操作,则可以将
np.array
转换为
list
,然后再转换回来,如下所示:

import numpy as np
import pandas as pd

df = pd.DataFrame({'v1':np.array(l1), 'v2':np.array(l2)})

df['index_of_v2_in_v1'] = np.array([list(df['v1']).index(x) for x in list(df['v2'])])

df
# Result:
#   v1 v2  index_of_v2_in_v1
# 0  a  c                  2
# 1  b  d                  3
# 2  c  b                  1
# 3  d  a                  0
如果100%确定列表1已排序(如您的问题所示),您只需在列表或数组上使用
np.argsort(l2)
,如下所示:

np.argsort(df['v2'])
# Returns:
#0    3
#1    2
#2    0
#3    1
#Name: v2, dtype: int64

在本例中,使用
map
比列表理解快约3.6倍:

from timeit import timeit

l1 = ['a','b','c','d']
l2 = ['c','d','b','a']

t1 = timeit('map(lambda e: l1.index(e), l2)', globals=globals())
t2 = timeit('[l1.index(x) for x in l2]', globals=globals())
print("t1 = %s, t2 = %s, t2/t1 = %s" % (t1, t2, t2/t1))
结果:

t1 = 0.32407195774213654, t2 = 1.162188749526786, t2/t1 = 3.586205846454439
n = 10, t1 = 3.25064, t2 = 3.70473, t3 = 0.339757
n = 20, t1 = 5.01145, t2 = 9.22295, t3 = 0.341116
n = 30, t1 = 7.18546, t2 = 16.6379, t3 = 0.344537
n = 40, t1 = 8.96271, t2 = 26.0522, t3 = 0.336952
n = 50, t1 = 11.0635, t2 = 37.7291, t3 = 0.341935
n = 60, t1 = 12.6453, t2 = 51.1519, t3 = 0.350777
n = 0, t1 = 0.410041093826, t2 = 0.0470049381256, t2/t1 = 0.114634700847
n = 5, t1 = 1.01210093498, t2 = 0.980098009109, t2/t1 = 0.96837970921
n = 10, t1 = 1.70017004013, t2 = 2.06220698357, t2/t1 = 1.21294160872
n = 15, t1 = 2.12121200562, t2 = 3.28132796288, t2/t1 = 1.54691183823
n = 20, t1 = 2.64426398277, t2 = 4.81948184967, t2/t1 = 1.82261751515
n = 25, t1 = 3.42534303665, t2 = 6.57365703583, t2/t1 = 1.9191237098
n = 30, t1 = 3.95739603043, t2 = 8.52685213089, t2/t1 = 2.15466232475
n = 35, t1 = 4.24842405319, t2 = 10.8080809116, t2/t1 = 2.54402121265
n = 40, t1 = 4.75647592545, t2 = 13.3403339386, t2/t1 = 2.80466760427
n = 45, t1 = 5.33353281021, t2 = 15.6205620766, t2/t1 = 2.92874584865

编辑:其他比较,包括@jbch提出的解决方案:

from timeit import timeit
from random import shuffle

for n in range(10, 70, 10):
    l1 = list(range(n))
    l2 = l1[:]
    shuffle(l2)

    t1 = timeit('indices = {val: i for i, val in enumerate(l1)}; [indices[x] for x in l2]', globals=globals())
    t2 = timeit('[l1.index(x) for x in l2]', globals=globals())
    t3 = timeit('map(lambda e: l1.index(e), l2)', globals=globals())
    print("n = %d, t1 = %g, t2 = %g, t3 = %g" % (n, t1, t2, t3))
结果:

t1 = 0.32407195774213654, t2 = 1.162188749526786, t2/t1 = 3.586205846454439
n = 10, t1 = 3.25064, t2 = 3.70473, t3 = 0.339757
n = 20, t1 = 5.01145, t2 = 9.22295, t3 = 0.341116
n = 30, t1 = 7.18546, t2 = 16.6379, t3 = 0.344537
n = 40, t1 = 8.96271, t2 = 26.0522, t3 = 0.336952
n = 50, t1 = 11.0635, t2 = 37.7291, t3 = 0.341935
n = 60, t1 = 12.6453, t2 = 51.1519, t3 = 0.350777
n = 0, t1 = 0.410041093826, t2 = 0.0470049381256, t2/t1 = 0.114634700847
n = 5, t1 = 1.01210093498, t2 = 0.980098009109, t2/t1 = 0.96837970921
n = 10, t1 = 1.70017004013, t2 = 2.06220698357, t2/t1 = 1.21294160872
n = 15, t1 = 2.12121200562, t2 = 3.28132796288, t2/t1 = 1.54691183823
n = 20, t1 = 2.64426398277, t2 = 4.81948184967, t2/t1 = 1.82261751515
n = 25, t1 = 3.42534303665, t2 = 6.57365703583, t2/t1 = 1.9191237098
n = 30, t1 = 3.95739603043, t2 = 8.52685213089, t2/t1 = 2.15466232475
n = 35, t1 = 4.24842405319, t2 = 10.8080809116, t2/t1 = 2.54402121265
n = 40, t1 = 4.75647592545, t2 = 13.3403339386, t2/t1 = 2.80466760427
n = 45, t1 = 5.33353281021, t2 = 15.6205620766, t2/t1 = 2.92874584865

C8H10N42的答案的时间复杂度为O(n^2),在大列表中需要很长时间。对index()的每次调用都是O(n),调用次数为n次

如果您需要更好的性能,可以使用此O(n)解决方案:

l1 = ['a','b','c','d']
l2 = ['c','d','b','a']
indices = {val: i for i, val in enumerate(l1)}

[indices[x] for x in l2]
创建字典是O(n),然后可以用O(1)dict访问替换O(n)index()调用。所以复杂性是O(n)+O(n),而不是O(n^2)

如果使用不同的列表大小尝试这两种方法,则列表越大,与之相比,index()的性能越差:

from timeit import timeit
from random import shuffle

for n in range(0, 50, 5):
    l1 = list(range(n))
    l2 = l1[:]
    shuffle(l2)

    t1 = timeit('indices = {val: i for i, val in enumerate(l1)}; [indices[x] for x in l2]', 'from __main__ import l1, l2')
    t2 = timeit('[l1.index(x) for x in l2]', 'from __main__ import l1, l2')
    print("n = %s, t1 = %s, t2 = %s, t2/t1 = %s" % (n, t1, t2, t2/t1))
结果:

t1 = 0.32407195774213654, t2 = 1.162188749526786, t2/t1 = 3.586205846454439
n = 10, t1 = 3.25064, t2 = 3.70473, t3 = 0.339757
n = 20, t1 = 5.01145, t2 = 9.22295, t3 = 0.341116
n = 30, t1 = 7.18546, t2 = 16.6379, t3 = 0.344537
n = 40, t1 = 8.96271, t2 = 26.0522, t3 = 0.336952
n = 50, t1 = 11.0635, t2 = 37.7291, t3 = 0.341935
n = 60, t1 = 12.6453, t2 = 51.1519, t3 = 0.350777
n = 0, t1 = 0.410041093826, t2 = 0.0470049381256, t2/t1 = 0.114634700847
n = 5, t1 = 1.01210093498, t2 = 0.980098009109, t2/t1 = 0.96837970921
n = 10, t1 = 1.70017004013, t2 = 2.06220698357, t2/t1 = 1.21294160872
n = 15, t1 = 2.12121200562, t2 = 3.28132796288, t2/t1 = 1.54691183823
n = 20, t1 = 2.64426398277, t2 = 4.81948184967, t2/t1 = 1.82261751515
n = 25, t1 = 3.42534303665, t2 = 6.57365703583, t2/t1 = 1.9191237098
n = 30, t1 = 3.95739603043, t2 = 8.52685213089, t2/t1 = 2.15466232475
n = 35, t1 = 4.24842405319, t2 = 10.8080809116, t2/t1 = 2.54402121265
n = 40, t1 = 4.75647592545, t2 = 13.3403339386, t2/t1 = 2.80466760427
n = 45, t1 = 5.33353281021, t2 = 15.6205620766, t2/t1 = 2.92874584865

考虑一下,还添加了一个数据样本到目前为止所做的代码。“无序列表的索引对应于有序列表”是什么意思?你能举个例子吗?谢谢,我希望有一个基于numpy的解决方案来避免循环。如果没有其他选项,我会再等一会儿,并将此标记为已接受。@jbch:将您的答案与@C8H10N4O2提出的基于列表理解的解决方案进行比较是不公平的,后者比我提出的基于
映射的解决方案慢。使用您的代码,我的解决方案仍然比n快30倍左右=45@sciroccorics我对这两种语言都使用了列表理解。如果两种情况都切换到map,则或多或少会得到相同的比率。无论如何,在我的机器和Python版本上。如果要进行比较,必须同时使用map。将第9行的
[x]索引[x]替换为
映射(lambda x:index[x],l2)
。我不知道为什么列表理解在你的系统上如此之慢,我只看到我的系统(Windows上的Python 3.5)上有20%的差异。我真的很困惑,为什么你会看到map和list Comp之间有如此大的性能差异。你们有什么版本的Python和操作系统?@jbch:我在Windows 10上使用Python 3.6。此外,我的基准测试是在一台相当旧的笔记本电脑上完成的。这周我在度假,但下周我要试一下其他的电脑。