Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 滚动窗口以返回数组_Python_Pandas_Numpy_Dataframe - Fatal编程技术网

Python 滚动窗口以返回数组

Python 滚动窗口以返回数组,python,pandas,numpy,dataframe,Python,Pandas,Numpy,Dataframe,下面是一个示例代码 df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB')) df['C'] = df.B.rolling(window=3) 输出: A B C 0 -0.108897 1.877987 Rolling [window=3,center=False,axis=0] 1 -1.276055 -0.

下面是一个示例代码

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
df['C'] = df.B.rolling(window=3)
输出:

           A         B                                       C
0 -0.108897  1.877987  Rolling [window=3,center=False,axis=0]
1 -1.276055 -0.424382  Rolling [window=3,center=False,axis=0]
2  1.578561 -1.094649  Rolling [window=3,center=False,axis=0]
3 -0.443294  1.683261  Rolling [window=3,center=False,axis=0]
4  0.674124  0.281077  Rolling [window=3,center=False,axis=0]
5  0.587773  0.697557  Rolling [window=3,center=False,axis=0]
6 -0.258038 -1.230902  Rolling [window=3,center=False,axis=0]
7 -0.443269  0.647107  Rolling [window=3,center=False,axis=0]
8  0.347187  0.753585  Rolling [window=3,center=False,axis=0]
9 -0.369179  0.975155  Rolling [window=3,center=False,axis=0]
          A         B                                                                  C
0  1.610085  0.354823                                                                NaN
1 -0.241446 -0.304952                                                                NaN
2  0.524812 -0.240972  [0.35482336179318674, -0.30495156795594963, -0.24097191924555197]
3  0.767354  0.281625   [-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]
4 -0.349844 -0.533781    [-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]
5 -0.174189  0.133795     [0.2816249674055174, -0.5337811449574766, 0.13379518286397707]
6  2.799437 -0.978349    [-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]
7  0.250129  0.289782     [0.13379518286397707, -0.9783488211443795, 0.2897823417165459]
8 -0.385259 -0.286399    [-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]
9 -0.755363 -1.010891    [0.2897823417165459, -0.28639931887491943, -1.0108913605575793]
我希望我的'C'列是一个类似[0.1231,-1.132,0.8766]的数组。 我尝试使用滚动应用程序,但没有成功

预期产出:

       A         B                 C
0 -0.108897  1.877987  []
1 -1.276055 -0.424382  []
2  1.578561 -1.094649  [-1.094649, -0.424382, 1.877987]
3 -0.443294  1.683261  [1.683261, -1.094649, -0.424382]
4  0.674124  0.281077  [0.281077, 1.683261, -1.094649]
5  0.587773  0.697557  [0.697557, 0.281077, 1.683261]
6 -0.258038 -1.230902  [-1.230902, 0.697557, 0.281077]
7 -0.443269  0.647107  [0.647107, -1.230902, 0.697557]
8  0.347187  0.753585  [0.753585, 0.647107, -1.230902]
9 -0.369179  0.975155  [0.975155, 0.753585, 0.647107]

您可以使用
np.stride\u技巧

import numpy as np
as_strided = np.lib.stride_tricks.as_strided  

df

          A         B
0 -0.272824 -1.606357
1 -0.350643  0.000510
2  0.247222  1.627117
3 -1.601180  0.550903
4  0.803039 -1.231291
5 -0.536713 -0.313384
6 -0.840931 -0.675352
7 -0.930186 -0.189356
8  0.151349  0.522533
9 -0.046146  0.507406

win = 3  # window size

# https://stackoverflow.com/a/47483615/4909087
v = as_strided(df.B, (len(df) - (win - 1), win), (df.B.values.strides * 2))

v
array([[ -1.60635669e+00,   5.10129842e-04,   1.62711678e+00],
       [  5.10129842e-04,   1.62711678e+00,   5.50902812e-01],
       [  1.62711678e+00,   5.50902812e-01,  -1.23129111e+00],
       [  5.50902812e-01,  -1.23129111e+00,  -3.13383794e-01],
       [ -1.23129111e+00,  -3.13383794e-01,  -6.75352179e-01],
       [ -3.13383794e-01,  -6.75352179e-01,  -1.89356194e-01],
       [ -6.75352179e-01,  -1.89356194e-01,   5.22532550e-01],
       [ -1.89356194e-01,   5.22532550e-01,   5.07405549e-01]])

df['C'] = pd.Series(v.tolist(), index=df.index[win - 1:])
df

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117  [-1.606356691642917, 0.0005101298424200881, 1....
3 -1.601180  0.550903  [0.0005101298424200881, 1.6271167809032248, 0....
4  0.803039 -1.231291  [1.6271167809032248, 0.5509028122535129, -1.23...
5 -0.536713 -0.313384  [0.5509028122535129, -1.2312911105674484, -0.3...
6 -0.840931 -0.675352  [-1.2312911105674484, -0.3133837943758246, -0....
7 -0.930186 -0.189356  [-0.3133837943758246, -0.6753521794378446, -0....
8  0.151349  0.522533  [-0.6753521794378446, -0.18935619377656243, 0....
9 -0.046146  0.507406  [-0.18935619377656243, 0.52253255045267, 0.507...

也许拉链对你的情况也有帮助,例如

def get_list(x,m) : return list(zip(*(x[i:] for i in range(m))))

# get_list(df['B'],3) would return 

[(-1.606357, 0.0005099999999999999, 1.627117),
 (0.0005099999999999999, 1.627117, 0.5509029999999999),
 (1.627117, 0.5509029999999999, -1.231291),
 (0.5509029999999999, -1.231291, -0.313384),
 (-1.231291, -0.313384, -0.6753520000000001),
 (-0.313384, -0.6753520000000001, -0.189356),
 (-0.6753520000000001, -0.189356, 0.522533),
 (-0.189356, 0.522533, 0.507406)]

df['C'] = pd.Series(get_list(df['B'],3), index=df.index[3 - 1:])
# Little help form @coldspeed

print(df)

          A         B                                                  C
0 -0.272824 -1.606357                                                NaN
1 -0.350643  0.000510                                                NaN
2  0.247222  1.627117       (-1.606357, 0.0005099999999999999, 1.627117)
3 -1.601180  0.550903  (0.0005099999999999999, 1.627117, 0.5509029999...
4  0.803039 -1.231291          (1.627117, 0.5509029999999999, -1.231291)
5 -0.536713 -0.313384         (0.5509029999999999, -1.231291, -0.313384)
6 -0.840931 -0.675352        (-1.231291, -0.313384, -0.6753520000000001)
7 -0.930186 -0.189356        (-0.313384, -0.6753520000000001, -0.189356)
8  0.151349  0.522533         (-0.6753520000000001, -0.189356, 0.522533)
9 -0.046146  0.507406                    (-0.189356, 0.522533, 0.507406)

让我们通过滚动应用技巧使用此方法:

df = pd.DataFrame(np.random.randn(10, 2), columns=list('AB'))
list_of_values = []
df.B.rolling(3).apply(lambda x: list_of_values.append(x.values) or 0, raw=False)
df.loc[2:,'C'] = pd.Series(list_of_values).values
df
输出:

           A         B                                       C
0 -0.108897  1.877987  Rolling [window=3,center=False,axis=0]
1 -1.276055 -0.424382  Rolling [window=3,center=False,axis=0]
2  1.578561 -1.094649  Rolling [window=3,center=False,axis=0]
3 -0.443294  1.683261  Rolling [window=3,center=False,axis=0]
4  0.674124  0.281077  Rolling [window=3,center=False,axis=0]
5  0.587773  0.697557  Rolling [window=3,center=False,axis=0]
6 -0.258038 -1.230902  Rolling [window=3,center=False,axis=0]
7 -0.443269  0.647107  Rolling [window=3,center=False,axis=0]
8  0.347187  0.753585  Rolling [window=3,center=False,axis=0]
9 -0.369179  0.975155  Rolling [window=3,center=False,axis=0]
          A         B                                                                  C
0  1.610085  0.354823                                                                NaN
1 -0.241446 -0.304952                                                                NaN
2  0.524812 -0.240972  [0.35482336179318674, -0.30495156795594963, -0.24097191924555197]
3  0.767354  0.281625   [-0.30495156795594963, -0.24097191924555197, 0.2816249674055174]
4 -0.349844 -0.533781    [-0.24097191924555197, 0.2816249674055174, -0.5337811449574766]
5 -0.174189  0.133795     [0.2816249674055174, -0.5337811449574766, 0.13379518286397707]
6  2.799437 -0.978349    [-0.5337811449574766, 0.13379518286397707, -0.9783488211443795]
7  0.250129  0.289782     [0.13379518286397707, -0.9783488211443795, 0.2897823417165459]
8 -0.385259 -0.286399    [-0.9783488211443795, 0.2897823417165459, -0.28639931887491943]
9 -0.755363 -1.010891    [0.2897823417165459, -0.28639931887491943, -1.0108913605575793]

由于熊猫
1.1
滚动对象是可编辑的,因此您只需执行以下操作:

df['C'] = list(df.B.rolling(window=3))
或者,如果您想要列表,您可以执行以下操作:

df['C'] = [window.to_list() for window in df.B.rolling(window=3)]

这是简短的,您可以使用
滚动
功能的所有方便参数。

在较新的numpy版本中,有一个

它提供的数组与
as_stried()
数组相同,但语法更加透明

将熊猫作为pd导入
从numpy.lib.stride\u导入滑动窗口\u视图
x=pd.系列([1,2,3,4,5,6,7,8,9])
滑动窗口视图(x,3)
>>>
数组([[1,2,3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7],
[6, 7, 8],
[7, 8, 9]])
但请注意,熊猫滚球在开始时会添加一些南(窗口大小-1),因为它使用填充。您可以这样检查:

x.rolling(3).sum()

>>>
0     NaN
1     NaN
2     6.0
3     9.0
4    12.0
5    15.0
6    18.0
7    21.0
8    24.0
dtype: float64

sliding_window_view(x, 3).sum(axis=1)
>>>
array([ 6,  9, 12, 15, 18, 21, 24])
所以实际对应的数组应该是:

c = np.array([[nan, nan,  1.],
              [nan,  1.,  2.],
              [ 1.,  2.,  3.],
              [ 2.,  3.,  4.],
              [ 3.,  4.,  5.],
              [ 4.,  5.,  6.],
              [ 5.,  6.,  7.],
              [ 6.,  7.,  8.],
              [ 7.,  8.,  9.]])

c.sum(axis=1)
>>>
array([nan, nan,  6.,  9., 12., 15., 18., 21., 24.])
还有一种方法:

df.join(pd.concat(df['B'].rolling(window=3),axis=1).apply(lambda x: x.dropna().tolist()).reset_index(drop=True).loc[2:].rename('C'))

不,这是不可能的。每个滚动窗口计算必须返回一个聚合结果。如果你的功能不能保证这一点,那么你可能需要考虑其他的选择。谢谢。是否可以使用loc/iloc/ix/others?我想回望一个窗口并获取数组。我想查看此数据的实际函数和预期输出。你到底想干什么?显然,除非我知道你真正的函数是做什么的,否则我帮不了你。我试图通过数组乘以X因子。例如,当因子为1.5时:[1,2,3,4]将返回[1.5,3,4.5,6]。我试图提供一个清晰的数据框架。你为什么需要一个滚动窗口?每个窗口的计算是如何相互关联的?它们如何适应最终的数据帧?例如,对于
[1,2,3,4,5,6,7,8]
窗口大小=3
的列,您希望的输出是什么?(这是非常重要的信息,我也希望你编辑你的问题…谢谢)哦,上帝,我一直在这里,想回答你的问题it@Bharath是 啊我从这里得到了一些帮助:太棒了,谢谢!这是否也适用于前瞻性?是否将下n行作为数组?@revendar-Hmm。。。我不是100%确定,但我认为stride_tricks代码是适用的。之后,您只需要弄清楚如何将其插入数据帧(例如,您将如何移动数据以及移动多少)。希望这有意义。@cᴏʟᴅsᴘᴇᴇᴅ 你觉得拉链的方法怎么样?很好的技巧。它还可以与
raw=True
一起使用,然后您可以直接追加
x
,而无需将该系列转换为
np.array
。我的意思是:
df.B.rolling(3).apply(lambda x:list of_values.append(x)或0,raw=True)
正是我需要的,您可以替换
window.to_list()
使用任何列表,并获取列表列表,您可以轻松将其转换为数据帧等。