python中列表中正值的滚动窗口_Python_Pandas_Numpy_Mean

python中列表中正值的滚动窗口

python pandas numpy

python中列表中正值的滚动窗口,python,pandas,numpy,mean,Python,Pandas,Numpy,Mean,什么是计算列表平均值的python方法，但只考虑正值所以如果我有价值观 [1,2,3,4,5，-1,4,2,3]我想计算三个值的滚动平均值，基本上是计算[1,2,3,4,5，'nan'，4,2,3]的平均滚动平均值。这就变成了 [nan，2,3,4,4.5,4.5,3，nan]其中第一个和最后一个nan是由于缺少元素造成的。 2=平均值（[1,2,3]） 3=平均值（[2,3,4]）但是4.5=平均值（[4,5，nan]）=平均值（[4,5]）等等因此，重要的是，当存在负值时，它们被排除

什么是计算列表平均值的python方法，但只考虑正值

所以如果我有价值观 [1,2,3,4,5，-1,4,2,3]我想计算三个值的滚动平均值，基本上是计算[1,2,3,4,5，'nan'，4,2,3]的平均滚动平均值。这就变成了 [nan，2,3,4,4.5,4.5,3，nan]其中第一个和最后一个nan是由于缺少元素造成的。 2=平均值（[1,2,3]） 3=平均值（[2,3,4]）但是4.5=平均值（[4,5，nan]）=平均值（[4,5]）等等因此，重要的是，当存在负值时，它们被排除在外，但除法是正值的数量之间

我试过：

def RollingPositiveAverage(listA,nElements):
     listB=[element for element in listA if element>0]
     return pd.rolling_mean(listB,3)

但是列表B缺少元素。我试着用nan替换这些元素，但是平均值变成了nan本身

有什么好的、优雅的方法来解决这个问题吗

感谢您使用熊猫：

import numpy as np
import pandas as pd

def RollingPositiveAverage(listA, window=3):
     s = pd.Series(listA)
     s[s < 0] = np.nan
     result = s.rolling(window, center=True, min_periods=1).mean()
     result.iloc[:window // 2] = np.nan
     result.iloc[-(window // 2):] = np.nan
     return result  # or result.values or list(result) if you prefer array or list

print(RollingPositiveAverage([1, 2, 3, 4, 5, -1, 4, 2, 3]))

纯Python版本：

import math

def RollingPositiveAverage(listA, window=3):
    result = [math.nan] * (window // 2)
    for win in zip(*(listA[i:] for i in range(window))):
        win = tuple(v for v in win if v >= 0)
        result.append(float(sum(win)) / min(len(win), 1))
    result.extend([math.nan] * (window // 2))
    return result

print(RollingPositiveAverage([1, 2, 3, 4, 5, -1, 4, 2, 3]))

输出：

0    NaN
1    2.0
2    3.0
3    4.0
4    4.5
5    4.5
6    3.0
7    3.0
8    NaN
dtype: float64

[nan, 2.0, 3.0, 4.0, 4.5, 4.5, 3.0, 3.0, nan]

由于您正在使用熊猫：

import numpy as np
import pandas as pd

def RollingPositiveAverage(listA, window=3):
     s = pd.Series(listA)
     s[s < 0] = np.nan
     result = s.rolling(window, center=True, min_periods=1).mean()
     result.iloc[:window // 2] = np.nan
     result.iloc[-(window // 2):] = np.nan
     return result  # or result.values or list(result) if you prefer array or list

print(RollingPositiveAverage([1, 2, 3, 4, 5, -1, 4, 2, 3]))

纯Python版本：

import math

def RollingPositiveAverage(listA, window=3):
    result = [math.nan] * (window // 2)
    for win in zip(*(listA[i:] for i in range(window))):
        win = tuple(v for v in win if v >= 0)
        result.append(float(sum(win)) / min(len(win), 1))
    result.extend([math.nan] * (window // 2))
    return result

print(RollingPositiveAverage([1, 2, 3, 4, 5, -1, 4, 2, 3]))

输出：

0    NaN
1    2.0
2    3.0
3    4.0
4    4.5
5    4.5
6    3.0
7    3.0
8    NaN
dtype: float64

[nan, 2.0, 3.0, 4.0, 4.5, 4.5, 3.0, 3.0, nan]

获取滚动求和，并获取参与正元素掩码滚动求和的有效元素的计数，然后简单地将它们除以平均值。对于滚动求和，我们可以使用

因此，实施-

def rolling_mean(a, W=3):
    a = np.asarray(a) # convert to array
    k = np.ones(W) # kernel for convolution

    # Mask of positive numbers and get clipped array
    m = a>=0
    a_clipped = np.where(m,a,0)

    # Get rolling windowed summations and divide by the rolling valid counts
    return np.convolve(a_clipped,k,'same')/np.convolve(m,k,'same')

扩展到边界处的

NaN填充

的特定情况-

def rolling_mean_pad(a, W=3):
    hW = (W-1)//2 # half window size for padding
    a = np.asarray(a) # convert to array
    k = np.ones(W) # kernel for convolution

    # Mask of positive numbers and get clipped array
    m = a>=0
    a_clipped = np.where(m,a,0)

    # Get rolling windowed summations and divide by the rolling valid counts
    out = np.convolve(a_clipped,k,'same')/np.convolve(m,k,'same')
    out[:hW] = np.nan
    out[-hW:] = np.nan
    return out

样本运行-

In [54]: a
Out[54]: array([ 1,  2,  3,  4,  5, -1,  4,  2,  3])

In [55]: rolling_mean_pad(a, W=3)
Out[55]: array([ nan,  2. ,  3. ,  4. ,  4.5,  4.5,  3. ,  3. ,  nan])

获取滚动求和，并获取参与正元素掩码滚动求和的有效元素的计数，然后简单地将它们除以平均值。对于滚动求和，我们可以使用

因此，实施-

def rolling_mean(a, W=3):
    a = np.asarray(a) # convert to array
    k = np.ones(W) # kernel for convolution

    # Mask of positive numbers and get clipped array
    m = a>=0
    a_clipped = np.where(m,a,0)

    # Get rolling windowed summations and divide by the rolling valid counts
    return np.convolve(a_clipped,k,'same')/np.convolve(m,k,'same')

扩展到边界处的

NaN填充

的特定情况-

def rolling_mean_pad(a, W=3):
    hW = (W-1)//2 # half window size for padding
    a = np.asarray(a) # convert to array
    k = np.ones(W) # kernel for convolution

    # Mask of positive numbers and get clipped array
    m = a>=0
    a_clipped = np.where(m,a,0)

    # Get rolling windowed summations and divide by the rolling valid counts
    out = np.convolve(a_clipped,k,'same')/np.convolve(m,k,'same')
    out[:hW] = np.nan
    out[-hW:] = np.nan
    return out

样本运行-

In [54]: a
Out[54]: array([ 1,  2,  3,  4,  5, -1,  4,  2,  3])

In [55]: rolling_mean_pad(a, W=3)
Out[55]: array([ nan,  2. ,  3. ,  4. ,  4.5,  4.5,  3. ,  3. ,  nan])

谢谢这看起来是最有希望的答案。“s[s<0]=np.nan”有什么作用？@PietroSperoni这将用

nan

替换

中的负值，因此滚动平均值将跳过计算中的值。稍后通过

min_periods=1

可以计算缺少值的窗口上的平均值（包括开始和结束，这就是为什么我添加了以下行以手动将其替换为

NaN

，如在预期的输出中所示）。如果窗口更大且内部有更多的NaN，这是否可行。假设我在一个360窗口（1800个值的列表中）上计算它，800个nan分散在列表中？@PietroSperoni您可以尝试一下，但它应该可以工作，只要窗口中至少有一个有效的正值。显然，如果您有一长串负值和/或

NaN

s，您也会在输出中看到

NaN

s，因为没有有效的元素进行平均（在Pandas版本中，Python版本现在会为这些窗口生成零）…@PietroSperoni Pandas可以“跳过”这是因为

min_periods

参数允许它在缺少值的窗口上计算，但无论如何，是的，在接受之前一定要测试它。谢谢。这看起来是最有希望的答案。“s[s<0]=np.nan”有什么作用？@PietroSperoni这将用

nan

替换

中的负值，因此滚动平均值将跳过计算中的值。稍后通过

min_periods=1

可以计算缺少值的窗口上的平均值（包括开始和结束，这就是为什么我添加了以下行以手动将其替换为

NaN

NaN

s，您也会在输出中看到

NaN

s，因为没有有效的元素进行平均（在Pandas版本中，Python版本现在会为这些窗口生成零）…@PietroSperoni Pandas可以“跳过”这是因为

NaN

参数允许它在缺少值的窗口上进行计算，但无论如何，是的，在接受之前一定要测试它。非常感谢。这个回答我听不懂，对不起，非常感谢。这个回答我听不懂，对不起。