Python 熊猫-可变滚动窗口_Python_Pandas_Dataframe_Numpy

Python 熊猫-可变滚动窗口

python pandas dataframe numpy

Python 熊猫-可变滚动窗口,python,pandas,dataframe,numpy,Python,Pandas,Dataframe,Numpy,我希望创建一个迭代滚动过程，用于满足特定条件时停止的熊猫数据帧。具体地说，我希望函数检查窗口上的值之和，并在绝对值超过某个值时停止 x = np.random.randint(0,5,(100,)) df = pd.DataFrame(x, columns=["value"]) df_iter = pd.DataFrame(index=df.index) max_iter = 5 threshold = 10 for i in range(2,max_iter+1):

我希望创建一个迭代滚动过程，用于满足特定条件时停止的熊猫数据帧。具体地说，我希望函数检查窗口上的值之和，并在绝对值超过某个值时停止

x = np.random.randint(0,5,(100,))
df = pd.DataFrame(x, columns=["value"])
df_iter = pd.DataFrame(index=df.index)
max_iter = 5
threshold = 10

for i in range(2,max_iter+1):
    df_iter[i] = df["value"].rolling(i).sum()

match_indices = np.argmax(df_iter.abs().values>threshold, axis=1)

上面的这类方法虽然达到了目标，但有点笨拙，需要更多的方法来解释那些没有达到阈值的项目

最终，我希望得到一系列[-1,0,1]，如果在最大窗口中超过了正阈值，则每个项都是1，如果超过了负阈值，则为1，否则为0。所以输出如下所示。请注意，由于滚动的性质，项目往往以集群形式出现。同样，最重要的特征是找到最近发生的超过阈值的情况

[0,1,1,1,0,0,-1,-1,-1,0,-1,-1,-1,-1,0,0,0,1,1,1,1]

那么有没有办法在熊猫中进行滚动查找呢？

事实证明，使用numpy的

cumsum

函数，这是相当容易的

data = np.random.randint(-10,10,(100,))
df = pd.DataFrame(data, columns=["value"])
max_n = 10
threshold = 10

def get_last_threshold(x):
    # reverse indexes and find cumulative sum for last max_n periods
    x = x.values[::-1].cumsum()
    # find first instance of absolute value of the cumulative sum above the threshold
    match = np.argmax(np.abs(x)>threshold) 
    # map to [-1,0,1] by getting sign of matching cumsums and filtering out items below threshold (np.argmax defaults to index 0 if no match found)
    signal = np.sign(x[match]) * (np.abs(x[match]) > threshold).astype(int)
    return signal

signals = df["value"].rolling(max_n, min_periods=1).apply(get_last_threshold).values
print(signals)

信号输出示例：

array([ 0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1.,  1.,  1., -1.,
   -1.,  0.,  1.,  0.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0.,
   -1., -1., -1.,  1., -1., -1., -1., -1., -1., -1.,  0.,  1.,  1.,
    1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  0., -1.,  0.,  1.,  0.,
   -1., -1., -1., -1., -1., -1., -1., -1.,  0.,  1.,  0.,  1.,  0.,
   -1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,
    1.,  1.,  1., -1., -1., -1., -1., -1., -1., -1.,  1.,  1.,  1.,
   -1., -1., -1.,  0.,  1.,  1.,  1.,  1.,  1.])

请提供你想要的确切输出格式。我编辑了这篇文章。我的意思是，根据特定的输入，准确地输入你想要的结果。代码中的正负阈值是什么？代码中已经有一个阈值，但为了更清楚，我将名称更改为

threshold

。但同样，代码只不过是解决问题的一种笨拙、幼稚的方法。如果超过了负阈值，这是什么