Python 通过平均值在numpy中放大1D阵列
我有一个数据数组,例如:Python 通过平均值在numpy中放大1D阵列,python,arrays,numpy,Python,Arrays,Numpy,我有一个数据数组,例如: data=np.数组([1,3,5,5,2,7,3,5,2,5]) 我想用给定样本数的平均值来“放大”数据 n_steps = 2 upscaled_data = [2, 2, 5, 5, 4.5, 4.5, 4, 4, 3.5, 3.5] 或 目前我正在这样做: def upscale_by_mean(data, sample_rate): upscaled_data = np.array([]) for i in range(sample_rate
data=np.数组([1,3,5,5,2,7,3,5,2,5])
我想用给定样本数的平均值来“放大”数据
n_steps = 2
upscaled_data = [2, 2, 5, 5, 4.5, 4.5, 4, 4, 3.5, 3.5]
或
目前我正在这样做:
def upscale_by_mean(data, sample_rate):
upscaled_data = np.array([])
for i in range(sample_rate, len(data), sample_rate):
mean = np.nanmean(data[i-(sample_rate):i])
val_to_append = np.full(shape=(sample_rate,), fill_value=mean)
upscaled_data = np.append(upscaled_data, val_to_append)
#The last three lines are just to handle when len(data)/sample_rate is odd
mean = np.nanmean(data[len(upscaled_data):])
val_to_append = np.full(shape=(len(data)-len(upscaled_data),), fill_value=mean)
upscaled_data = np.append(upscaled_data, val_to_append)
return upscaled_data
以上工作如预期。然而,当我将其扩展到5000万个样本的数组时,运行时变得令人担忧。似乎应该有一个更有效的解决方案来解决这个问题
编辑:与以下答案具有相同运行时间的替代解决方案是:
def upscale_by_mean(data, sample_rate=5):
new_data = data[:len(data)-len(data)%sample_rate].reshape(len(data)//sample_rate, sample_rate)
row_mean= np.nanmean(new_data, axis=1)
upscaled_data = np.repeat(row_mean, sample_rate)
if data.size % sample_rate:
row_mean = np.nanmean(data[-(data.size % sample_rate):])
val_to_append = np.full(shape=(len(data)-len(upscaled_data),), fill_value=row_mean)
upscaled_data = np.append(upscaled_data, val_to_append)
return upscaled_data
我能想到的最好方法是使用
np.add.reduceat
将求和向量化
def _upscale_by_mean(data, sample_rate):
l = data.size
ix = np.zeros(l, dtype = int)
ix[::sample_rate] = 1
vals = np.add.reduceat(data, np.flatnonzero(ix))/sample_rate
if l % sample_rate:
vals[-1] = data[-(l % sample_rate):].mean()
return vals[ix.cumsum()-1]
测试:
data = np.random.randint(10, size = 1000)
%timeit _upscale_by_mean(data, 2) #above
26.9 µs ± 756 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit upscale_by_mean(data, 2) #original
9.91 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
data = np.random.randint(10, size = 1000)
%timeit _upscale_by_mean(data, 2) #above
26.9 µs ± 756 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit upscale_by_mean(data, 2) #original
9.91 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)