Python 加快列表之间浮动的比较_Python_Performance_Numpy

Python 加快列表之间浮动的比较

python performance numpy

Python 加快列表之间浮动的比较,python,performance,numpy,Python,Performance,Numpy,我有一个代码块，它执行以下操作：从索引indx的下面列表中取一个浮点数检查此浮动是否位于索引i的浮动和列表a\u lst中的下一个浮动（索引i+1）之间如果是，则将indx存储在第三个列表（c_lst）的子列表中，其中该子列表的索引是a_lst中左浮动的索引（即：i）对b_lst 这里有一个MWE，显示了代码的功能： import numpy as np import timeit def random_data(N): # Generate some random dat

我有一个代码块，它执行以下操作：

从索引indx的下面列表中取一个浮点数
检查此浮动是否位于索引
```
i
```
的浮动和列表
```
a\u lst
```
中的下一个浮动（索引
```
i+1
```
）之间
如果是，则将
```
indx
```
存储在第三个列表（
```
c_lst
```
）的子列表中，其中该子列表的索引是
```
a_lst
```
中左浮动的索引（即：
```
i
```
）
对
```
b_lst
```

这里有一个

MWE

，显示了代码的功能：

import numpy as np
import timeit

def random_data(N):
    # Generate some random data.
    return np.random.uniform(0., 10., N).tolist()

# Data lists.
# Note that a_lst is sorted.
a_lst = np.sort(random_data(1000))
b_lst = random_data(5000)
# Fixed index value (int)
c = 25

def func():
    # Create empty list with as many sub-lists as elements present
    # in a_lst beyond the 'c' index.
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]

    # For each element in b_lst.
    for indx,elem in enumerate(b_lst):

        # For elements in a_lst beyond the 'c' index.
        for i in range(len(a_lst[c:])-1):

            # Check if 'elem' is between this a_lst element
            # and the next.
            if a_lst[c+i] < elem <= a_lst[c+(i+1)]:

                # If it is then store the index of 'elem' ('indx')
                # in the 'i' sub-list of c_lst.
                c_lst[i].append(indx)

    return c_lst

print func()
# time function.
func_time = timeit.timeit(func, number=10)
print func_time

这比原始功能快约130倍

添加3个

根据的建议，我将

np.searchsorted

的结果转换为带有

.tolist（）

的列表：

def func_opt3（）：
c_lst=[[]表示范围内的（len（a_lst[c:]）-1）]
c_opt=np.searchsorted（a_lst[c:]，b_lst，side='left'）.tolist（）
对于indx，枚举中的元素（c_opt）：
如果你想看看numpy的。召唤
将返回一个索引数组，其长度与b_lst
相同，保留在a_lst
中应插入的项目之前以保持顺序。这将是非常快的，因为它使用二进制搜索，循环发生在C中。然后，您可以创建具有奇特索引的子阵列，例如：
>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226,  0.40043875,  0.62398925,  0.40097674,  0.58765603,
        0.14045264,  0.16990249,  0.78264088,  0.51507254,  0.31808327,
        0.03895417,  0.92130027])
>>> b[c == 1]
array([ 1.34599709,  1.42645778,  1.13025996,  1.20096723,  1.75724448,
        1.87447058,  1.23422399,  1.37807553,  1.64118058,  1.53740299])

EWW，列表-你考虑过NUMPY吗？EWW，FrRou-你考虑过NUMPY吗？严肃点：你所有东西的尺寸是多少？这里解释的200和1000仅仅是为了虚拟目的吗？或者这些是真实的大小？是的，我知道，但在我的例子中，列表和for循环是肮脏的快速编码方式。之后是绩效提升阶段。至于你的问题，它们可以增长一点，比如说1000/5000，但我不希望它们增长得更多。numpy是一种干净快速的编码方式——一旦你习惯了，你几乎再也不用列表了，一旦你学会（ab）使用切片，您将不再使用for循环either@usethedeathstarOP可能是在暗示：“我如何使用numpy完成这些工作？”通过将搜索排序结果转换回具有.tolist（）的列表，numpy类型的标量操作具有相当高的惩罚jaime，您可以从func_opt2
中获得更高的性能，我不确定该如何将其应用于我的代码。我需要一个存储在c\u lst
中的b\u lst
索引列表。在这种情况下，您需要使用np.where（c==0）
，np.where（c==1）
，np.where（c==2），好的，现在就知道了。我会根据你的回答更新这个问题。它很难看，我很确定它可以进一步优化，但这只是第一步。谢谢
def func_opt2():
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]
    c_opt = np.searchsorted(a_lst[c:], b_lst, side='left')
    for indx,elem in enumerate(c_opt):
        if 0<elem<len(a_lst[c:]):
            c_lst[elem-1].append(indx)
    return c_lst

def func_opt3():
    c_lst = [[] for _ in range(len(a_lst[c:])-1)]
    c_opt = np.searchsorted(a_lst[c:], b_lst, side='left').tolist()
    for indx,elem in enumerate(c_opt):
        if 0<elem<len(a_lst[c:]):
            c_lst[elem-1].append(indx)
    return c_lst

np.searchsorted(a_lst, b_lst, side='right')

>>> a = np.arange(1, 10)
>>> b = np.random.rand(100) * 10
>>> c = np.searchsorted(a, b, side='right')
>>> b[c == 0]
array([ 0.54620226,  0.40043875,  0.62398925,  0.40097674,  0.58765603,
        0.14045264,  0.16990249,  0.78264088,  0.51507254,  0.31808327,
        0.03895417,  0.92130027])
>>> b[c == 1]
array([ 1.34599709,  1.42645778,  1.13025996,  1.20096723,  1.75724448,
        1.87447058,  1.23422399,  1.37807553,  1.64118058,  1.53740299])