Python 通过平均值在numpy中放大1D阵列

Python 通过平均值在numpy中放大1D阵列,python,arrays,numpy,Python,Arrays,Numpy,我有一个数据数组,例如: data=np.数组([1,3,5,5,2,7,3,5,2,5]) 我想用给定样本数的平均值来“放大”数据 n_steps = 2 upscaled_data = [2, 2, 5, 5, 4.5, 4.5, 4, 4, 3.5, 3.5] 或 目前我正在这样做: def upscale_by_mean(data, sample_rate): upscaled_data = np.array([]) for i in range(sample_rate

我有一个数据数组,例如:

data=np.数组([1,3,5,5,2,7,3,5,2,5])

我想用给定样本数的平均值来“放大”数据

n_steps = 2
upscaled_data = [2, 2, 5, 5, 4.5, 4.5, 4, 4, 3.5, 3.5]

目前我正在这样做:

def upscale_by_mean(data, sample_rate):
    upscaled_data = np.array([])
    for i in range(sample_rate, len(data), sample_rate):
        mean = np.nanmean(data[i-(sample_rate):i])
        val_to_append = np.full(shape=(sample_rate,), fill_value=mean)
        upscaled_data = np.append(upscaled_data, val_to_append)
    
    #The last three lines are just to handle when len(data)/sample_rate is odd
    mean = np.nanmean(data[len(upscaled_data):])
    val_to_append = np.full(shape=(len(data)-len(upscaled_data),), fill_value=mean)
    upscaled_data = np.append(upscaled_data, val_to_append)
    
    return upscaled_data
以上工作如预期。然而,当我将其扩展到5000万个样本的数组时,运行时变得令人担忧。似乎应该有一个更有效的解决方案来解决这个问题

编辑:与以下答案具有相同运行时间的替代解决方案是:

def upscale_by_mean(data, sample_rate=5):
    new_data = data[:len(data)-len(data)%sample_rate].reshape(len(data)//sample_rate, sample_rate)
    row_mean= np.nanmean(new_data, axis=1)
    upscaled_data = np.repeat(row_mean, sample_rate)
    if data.size % sample_rate:
        row_mean = np.nanmean(data[-(data.size % sample_rate):])
        val_to_append = np.full(shape=(len(data)-len(upscaled_data),), fill_value=row_mean)
        upscaled_data = np.append(upscaled_data, val_to_append)
    return upscaled_data

我能想到的最好方法是使用
np.add.reduceat
将求和向量化

def _upscale_by_mean(data, sample_rate):
    l = data.size
    ix = np.zeros(l, dtype = int)
    ix[::sample_rate] = 1
    
    vals = np.add.reduceat(data, np.flatnonzero(ix))/sample_rate
    if l % sample_rate:
        vals[-1] = data[-(l % sample_rate):].mean() 
    return vals[ix.cumsum()-1]
测试:

data = np.random.randint(10, size = 1000)

%timeit _upscale_by_mean(data, 2) #above
26.9 µs ± 756 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit upscale_by_mean(data, 2) #original
9.91 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
data = np.random.randint(10, size = 1000)

%timeit _upscale_by_mean(data, 2) #above
26.9 µs ± 756 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit upscale_by_mean(data, 2) #original
9.91 ms ± 308 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)