Python 寻找改进的冗长循环
在我的代码中,我需要多次计算向量的值,这些值是另一个数组的不同面片的平均值。 下面是我的代码示例,展示了我是如何做到这一点的,但我发现它在运行时效率太低了Python 寻找改进的冗长循环,python,loops,numpy,coding-efficiency,Python,Loops,Numpy,Coding Efficiency,在我的代码中,我需要多次计算向量的值,这些值是另一个数组的不同面片的平均值。 下面是我的代码示例,展示了我是如何做到这一点的,但我发现它在运行时效率太低了 import numpy as np vector_a = np.zeros(10) array_a = np.random.random((100,100)) for i in range(len(vector_a)): vector_a[i] = np.mean(array_a[:,i+20:i+40] 有没有办法使它更有效率?
import numpy as np
vector_a = np.zeros(10)
array_a = np.random.random((100,100))
for i in range(len(vector_a)):
vector_a[i] = np.mean(array_a[:,i+20:i+40]
有没有办法使它更有效率?欢迎提出任何意见或建议!非常感谢
-是的,20和40是固定的。试试这个:
import numpy as np
array_a = np.random.random((100,100))
vector_a = [np.mean(array_a[:,i+20:i+40]) for i in range(10)]
试试这个:
import numpy as np
array_a = np.random.random((100,100))
vector_a = [np.mean(array_a[:,i+20:i+40]) for i in range(10)]
编辑:
实际上你可以做得更快。通过对求和列进行如下操作,可以改进前面的函数:
def rolling_means_faster1(array_a, n, first, size):
# Sum each relevant columns
sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
# Reshape as before
strides_b = (sum_a.strides[0], sum_a.strides[0])
array_b = np.lib.stride_tricks.as_strided(sum_a, (n, size), (strides_b))
# Average
v = np.sum(array_b, axis=1)
v /= (len(array_a) * size)
return v
另一种方法是使用累积和,根据需要为每个输出元素添加和删除
def rolling_means_faster2(array_a, n, first, size):
# Sum each relevant columns
sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
# Add a zero a the beginning so the next operation works fine
sum_a = np.insert(sum_a, 0, 0)
# Sum the initial `size` elements and add and remove partial sums as necessary
v = np.sum(sum_a[:size]) - np.cumsum(sum_a[:n]) + np.cumsum(sum_a[-n:])
# Average
v /= (size * len(array_a))
return v
与以前的解决方案进行基准测试:
import numpy as np
np.random.seed(100)
array_a = np.random.random((1000, 1000))
n = 100
first = 100
size = 200
%timeit rolling_means_orig(array_a, n, first, size)
# 12.7 ms ± 55.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means(array_a, n, first, size)
# 5.49 ms ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means_faster1(array_a, n, first, size)
# 166 µs ± 874 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rolling_means_faster2(array_a, n, first, size)
# 182 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
因此,最后两个似乎在性能上非常接近。它可能取决于输入的相对大小
这是一种可能的矢量化解决方案:
import numpy as np
# Data
np.random.seed(100)
array_a = np.random.random((100, 100))
# Take all the relevant columns
slice_a = array_a[:, 20:40 + 10]
# Make a "rolling window" with stride tricks
strides_b = (slice_a.strides[1], slice_a.strides[0], slice_a.strides[1])
array_b = np.lib.stride_tricks.as_strided(slice_a, (10, 100, 20), (strides_b))
# Take mean
result = np.mean(array_b, axis=(1, 2))
# Original method for testing correctness
vector_a = np.zeros(10)
idv1 = np.arange(10) + 20
idv2 = np.arange(10) + 40
for i in range(len(vector_a)):
vector_a[i] = np.mean(array_a[:,idv1[i]:idv2[i]])
print(np.allclose(vector_a, result))
# True
以下是IPython的快速基准(为增值而增加的尺寸):
编辑:
实际上你可以做得更快。通过对求和列进行如下操作,可以改进前面的函数:
def rolling_means_faster1(array_a, n, first, size):
# Sum each relevant columns
sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
# Reshape as before
strides_b = (sum_a.strides[0], sum_a.strides[0])
array_b = np.lib.stride_tricks.as_strided(sum_a, (n, size), (strides_b))
# Average
v = np.sum(array_b, axis=1)
v /= (len(array_a) * size)
return v
另一种方法是使用累积和,根据需要为每个输出元素添加和删除
def rolling_means_faster2(array_a, n, first, size):
# Sum each relevant columns
sum_a = np.sum(array_a[:, first:(first + size + n - 1)], axis=0)
# Add a zero a the beginning so the next operation works fine
sum_a = np.insert(sum_a, 0, 0)
# Sum the initial `size` elements and add and remove partial sums as necessary
v = np.sum(sum_a[:size]) - np.cumsum(sum_a[:n]) + np.cumsum(sum_a[-n:])
# Average
v /= (size * len(array_a))
return v
与以前的解决方案进行基准测试:
import numpy as np
np.random.seed(100)
array_a = np.random.random((1000, 1000))
n = 100
first = 100
size = 200
%timeit rolling_means_orig(array_a, n, first, size)
# 12.7 ms ± 55.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means(array_a, n, first, size)
# 5.49 ms ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit rolling_means_faster1(array_a, n, first, size)
# 166 µs ± 874 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit rolling_means_faster2(array_a, n, first, size)
# 182 µs ± 2.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
因此,最后两个似乎在性能上非常接近。它可能取决于输入的相对大小
这是一种可能的矢量化解决方案:
import numpy as np
# Data
np.random.seed(100)
array_a = np.random.random((100, 100))
# Take all the relevant columns
slice_a = array_a[:, 20:40 + 10]
# Make a "rolling window" with stride tricks
strides_b = (slice_a.strides[1], slice_a.strides[0], slice_a.strides[1])
array_b = np.lib.stride_tricks.as_strided(slice_a, (10, 100, 20), (strides_b))
# Take mean
result = np.mean(array_b, axis=(1, 2))
# Original method for testing correctness
vector_a = np.zeros(10)
idv1 = np.arange(10) + 20
idv2 = np.arange(10) + 40
for i in range(len(vector_a)):
vector_a[i] = np.mean(array_a[:,idv1[i]:idv2[i]])
print(np.allclose(vector_a, result))
# True
以下是IPython的快速基准(为增值而增加的尺寸):
此解决方案的工作原理是假设您正在尝试计算列窗口子集的滚动平均值。 例如,忽略行,给定
[0,1,2,3,4]
和2
窗口,平均值为[0.5,1.5,2.5,3.5]
,您可能只需要第二个和第三个平均值
您当前的解决方案效率低下,因为它会重新计算vector\u a
中每个输出列的平均值。假设(a/n)+(b/n)==(a+b)/n
,我们只需计算每列的平均值一次,然后根据需要组合列平均值即可生成最终输出
window_first_start = idv1.min() # or idv1[0]
window_last_end = idv2.max() # or idv2[-1]
window_size = idv2[0] - idv1[0]
assert ((idv2 - idv1) == window_size).all(), "sanity check, not needed if assumption holds true"
# a view of the columns we are interested in, no copying is done here
view = array_a[:,window_first_start:window_last_end]
# calculate the means for each column
col_means = view.mean(axis=0)
# cumsum is used to find the rolling sum of means and so the rolling average
# We use an out variable to make sure we have a 0 in the first element of cum_sum.
# This makes like a little easier in the next step.
cum_sum = np.empty(len(col_means) + 1, dtype=col_means.dtype)
cum_sum[0] = 0
np.cumsum(col_means, out=cum_sum[1:])
result = (cum_sum[window_size:] - cum_sum[:-window_size]) / window_size
通过对您自己的代码进行测试,上述方法的速度明显更快(随着输入阵列的大小而增加),并且略快于jdehesa提供的解决方案。对于1000x1000的输入数组,它比您的解决方案快两个数量级,比jdehesa快一个数量级。此解决方案的工作原理是假设您正在尝试计算列窗口子集的滚动平均值。 例如,忽略行,给定
[0,1,2,3,4]
和2
窗口,平均值为[0.5,1.5,2.5,3.5]
,您可能只需要第二个和第三个平均值
您当前的解决方案效率低下,因为它会重新计算vector\u a
中每个输出列的平均值。假设(a/n)+(b/n)==(a+b)/n
,我们只需计算每列的平均值一次,然后根据需要组合列平均值即可生成最终输出
window_first_start = idv1.min() # or idv1[0]
window_last_end = idv2.max() # or idv2[-1]
window_size = idv2[0] - idv1[0]
assert ((idv2 - idv1) == window_size).all(), "sanity check, not needed if assumption holds true"
# a view of the columns we are interested in, no copying is done here
view = array_a[:,window_first_start:window_last_end]
# calculate the means for each column
col_means = view.mean(axis=0)
# cumsum is used to find the rolling sum of means and so the rolling average
# We use an out variable to make sure we have a 0 in the first element of cum_sum.
# This makes like a little easier in the next step.
cum_sum = np.empty(len(col_means) + 1, dtype=col_means.dtype)
cum_sum[0] = 0
np.cumsum(col_means, out=cum_sum[1:])
result = (cum_sum[window_size:] - cum_sum[:-window_size]) / window_size
通过对您自己的代码进行测试,上述方法的速度明显更快(随着输入阵列的大小而增加),并且略快于jdehesa提供的解决方案。对于1000x1000的输入阵列,它比您的解决方案快两个数量级,比jdehesa快一个数量级。我们可以假设
20
和40
是固定输入吗?我认为这里的问题很好。将一些基于循环的逻辑矢量化是一个非常常见的SO问题。您的问题的可能重复似乎与此问题非常相似:注意,我在我的答案中添加了一个更快的解决方案。我们可以假设20
和40
是固定输入吗?我认为这里的问题很好。将一些基于循环的逻辑矢量化是一个非常常见的SO问题。您的问题的可能重复似乎与此问题非常相似:注意,我在我的答案中添加了一个更快的解决方案。这基本相同,但更短。这对效率没有帮助。@jdehesa好吧,实际上效率要高一点)这基本上是一样的,但更短。这对效率没有帮助。@jdehesa好吧,实际上效率要高一点)@LinchengLi标准偏差必然要做更多的工作,因为它包括平方和平方根。你可以“手工”进行计算,使用一种“更快”的方法计算平均值(加速该部分),然后自己计算其余部分,但这似乎比只接受我的第一个答案并将np.mean
替换为np.std
要慢(我不知道NumPy是如何实现std的,但它可能比先计算平均值然后应用公式更聪明)。这只是大量的计算,我不确定是否有很多方法可以加快计算速度。@LinchengLi标准偏差必然需要更多的工作,因为它包括平方和平方根。你可以“手工”进行计算,使用一种“更快”的平均数方法(加速该部分)然后自己计算其余的,但这似乎比我第一次回答并用np.std
替换np.mean
要慢(我不知道NumPy是如何实现std的,但这可能比先计算平均值然后应用公式更聪明).这只是大量的计算,我不确定是否有很多方法可以加快计算速度。