Python 检查所有行组合，找出数据帧特定列上具有最大平均值的行_Python_Pandas_Dataframe

Python 检查所有行组合，找出数据帧特定列上具有最大平均值的行

python pandas dataframe

Python 检查所有行组合，找出数据帧特定列上具有最大平均值的行,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个这样的数据框 df col1 col2 ind A 1 0 B 2 1 C 10 2 D 5 3 E 11 4 F 4 5 G 7 6 H 20 7 I 33 8 J 24 9 K 22 10 L

我有一个这样的数据框

df
col1    col2    ind
 A        1      0
 B        2      1
 C        10     2
 D        5      3
 E        11     4
 F        4      5
 G        7      6
 H        20     7
 I        33     8
 J        24     9
 K        22     10
 L        5      11

现在我想创建一个最小大小为数据帧长度1/4的窗口，最大大小为数据帧的总长度

现在我想尝试步长为1的每个窗口，并计算平均值。我想看看哪种组合的最小值最大

我使用了下面的代码，它给出了正确的结果。但正如我使用的循环，执行时间很差

start_index=[]
stop_index=[]
average=[]
min_window=len(df)//4
for i in range(min_window,len(df)):
    for j in range(0,len(df)-i):
        t_df=df.iloc[j:j+i,:]
        avg=np.mean(list(t_df.col2))
        start_index.append(t_df.ind.values[0])
        stop_index.append(t_df.ind.values[-1])
        average.append(avg)

# now we can find the rows with max average from the indices.

由于for循环的执行时间很短，所以我正在寻找一些pandas/python技巧来最有效地完成相同的任务

一个想法是使用解决方案：

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

min_window=len(df)//4
for i in range(min_window,len(df)):
    avg = rolling_window(df['col2'].values, i).mean(axis=1)
    ind = rolling_window(df['ind'].values, i)
    start = ind[:, 0]
    end = ind[:, -1]
    min_idx = np.argmax(avg)
    #print (min_idx)

    print (f'start is {start[min_idx]}, end is {end[min_idx]} max is {avg[min_idx]}')

它的工作原理-它根据窗口大小创建2d数组，因此您可以通过选择第一个和最后一个

'columns'

获得

平均值每轴=1
，然后通过选择第一个和最后一个'columns'
获得开始和结束，对于最大值平均值用于索引最大值，然后索引ind
：
print (rolling_window(df['col2'].values, 3))
[[ 1  2 10]
 [ 2 10  5]
 [10  5 11]
 [ 5 11  4]
 [11  4  7]
 [ 4  7 20]
 [ 7 20 33]
 [20 33 24]
 [33 24 22]
 [24 22  5]]