Python 在数据帧上迭代并一次选择n个行和列

Python 在数据帧上迭代并一次选择n个行和列,python,pandas,Python,Pandas,因此,我有一个数据集,如下所示: # Example 0 1 2 3 4 5 0 18 1 -19 -16 -5 19 1 18 0 -19 -17 -6 19 2 17 -1 -20 -17 -6 19 3 18 1 -19 -16 -5 20 4 18 0 -19 -16 -5 20 实际数据: [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19}, {0: 18,

因此,我有一个数据集,如下所示:

# Example
     0  1     2   3  4   5
0   18  1   -19 -16 -5  19
1   18  0   -19 -17 -6  19
2   17  -1  -20 -17 -6  19
3   18  1   -19 -16 -5  20
4   18  0   -19 -16 -5  20
实际数据:

[{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
 {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
 {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
 {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
 {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
 {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
 {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
 {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
 {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
上面的形状是:
(20,6)

我想要实现的是对4行上的每一列同时应用一个自定义函数

例如:

  • 第一次迭代->
    f()
    应用于所有列的
    df.ix[0:3]
  • 第二次迭代->
    f()
    应用于所有列的
    df.ix[4:7]
    等等

    在某种程度上,我需要的是用步幅4滚动4号窗口。

    使用上述数据时,结果将是形状为的数据框:
    (5,6)
    。为了便于讨论,您可以假设自定义函数为每列取这4行的平均值

    到目前为止我试过什么

  • 我研究过滚动,但滚动并不能满足我的需要。它以1的步幅滚动窗口
  • 我尝试过实际实施它,但由于数据量大,我确实需要对此进行优化:
  • 代码如下:

    curr = 0
    res = []
    while curr < df_to_look_at2.shape[0]:
        look_at = df_to_look_at2.ix[curr:curr+3]
        curr += 4
        res.append(look_at.mean().values.tolist())
    pd.DataFrame(res)
    

    一个额外的想法,如果它不采用Mead,而是Min(),Max(),(和)其他一些自定义函数…

    滚动,如果你想在一个以上的窗口中不止一次地考虑行,那么这里是精确的。然而,你的窗口是独一无二的,所以你真正想问的是如何根据你的步幅进行分组,你可以使用

    arange
    和floor division进行分组

    window_size = 4
    grouper = np.arange(df.shape[0]) // window_size
    
    df.groupby(grouper).mean()
    


    如果你想在一个以上的窗口中不止一次地考虑一行,那么滚动将是精确的。然而,你的窗口是独一无二的,所以你真正想问的是如何根据你的步幅进行分组,你可以使用
    arange
    和floor division进行分组

    window_size = 4
    grouper = np.arange(df.shape[0]) // window_size
    
    df.groupby(grouper).mean()
    


    我认为以这种方式进行的多次计算确实属于numpy的地盘。可以使用重塑以所需格式获取基础数组,并根据需要对数组进行计算

    inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
     {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
     {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
     {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
     {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
     {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
     {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
     {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
     {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
     {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
    
    import pandas as pd
    df = pd.DataFrame(inp)
    
    temp = df.values.reshape(-1, 4, df.shape[-1])
    
    out = pd.DataFrame(temp.mean(axis=1))
    
    输出:

           0     1      2      3     4      5
    0  17.75  0.25 -19.25 -16.50 -5.50  19.25
    1  18.25  0.25 -19.00 -16.00 -5.25  19.50
    2  17.75  0.25 -19.25 -16.75 -5.75  19.00
    3  17.75  0.25 -19.00 -16.00 -4.75  19.75
    4  17.75  0.25 -18.75 -14.75 -3.75  21.00
    

    我认为以这种方式进行的多次计算确实属于numpy的地盘。可以使用重塑以所需格式获取基础数组,并根据需要对数组进行计算

    inp = [{0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 19},
     {0: 18, 1: 0, 2: -19, 3: -17, 4: -6, 5: 19},
     {0: 17, 1: -1, 2: -20, 3: -17, 4: -6, 5: 19},
     {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -20, 3: -15, 4: -4, 5: 20},
     {0: 19, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
     {0: 18, 1: 0, 2: -20, 3: -18, 4: -7, 5: 18},
     {0: 17, 1: 0, 2: -19, 3: -17, 4: -7, 5: 18},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 1, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -16, 4: -5, 5: 20},
     {0: 18, 1: 1, 2: -18, 3: -16, 4: -5, 5: 20},
     {0: 17, 1: 0, 2: -20, 3: -16, 4: -5, 5: 19},
     {0: 17, 1: 0, 2: -19, 3: -16, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -15, 4: -4, 5: 20},
     {0: 18, 1: 0, 2: -19, 3: -14, 4: -3, 5: 22},
     {0: 18, 1: 1, 2: -18, 3: -14, 4: -4, 5: 22}]
    
    import pandas as pd
    df = pd.DataFrame(inp)
    
    temp = df.values.reshape(-1, 4, df.shape[-1])
    
    out = pd.DataFrame(temp.mean(axis=1))
    
    输出:

           0     1      2      3     4      5
    0  17.75  0.25 -19.25 -16.50 -5.50  19.25
    1  18.25  0.25 -19.00 -16.00 -5.25  19.50
    2  17.75  0.25 -19.25 -16.75 -5.75  19.00
    3  17.75  0.25 -19.00 -16.00 -4.75  19.75
    4  17.75  0.25 -18.75 -14.75 -3.75  21.00
    

    看起来您可能只想按每四个值分组。可能
    df.groupby(np.arange(df.shape[0])//4)
    ?看起来您可能只想按每四个值进行分组。也许
    df.groupby(np.arange(df.shape[0])//4)