Python 3.x 如何在将标志0更改为1python之前取3个值的平均值
我有一个带有A、B和flag列的数据帧。我想计算标志从0变为1之前2个值的平均值,当标志从0变为1时记录值,当标志从1变为0时记录值Python 3.x 如何在将标志0更改为1python之前取3个值的平均值,python-3.x,pandas,numpy,pandas-groupby,Python 3.x,Pandas,Numpy,Pandas Groupby,我有一个带有A、B和flag列的数据帧。我想计算标志从0变为1之前2个值的平均值,当标志从0变为1时记录值,当标志从1变为0时记录值 # Input dataframe df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87], 'B':[1,3,4,6,8,11,1,19,20,15,16,87], 'flag':[0,0,0,0,1,1,1,0,0,0,0,0]}) # Ex
# Input dataframe
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,0,0,0]})
# Expected output
df_out=df=pd.DataFrame({'A_mean_before_flag_change':[5.5],
'B_mean_before_flag_change':[5],
'A_value_before_change_flag':[7],
'B_value_before_change_flag':[6]})
我假设这需要适用于具有多个上升沿的情况,并且连续值和平均值会附加到输出列表中:
# the first step is to extract the rising and falling edges using diff(), identify sections and length
df['flag_diff'] = df.flag.diff().fillna(0)
df['flag_sections'] = (df.flag_diff != 0).cumsum()
df['flag_sum'] = df.flag.groupby(df.flag_sections).transform('sum')
# then you can get the relevant indices by checking for the rising edges
rising_edges = df.index[df.flag_diff==1.0]
val_indices = [i-1 for i in rising_edges]
avg_indices = [(i-2,i-1) for i in rising_edges]
# and finally iterate over the relevant sections
df_out = pd.DataFrame()
df_out['A_mean_before_flag_change'] = [df.A.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['B_mean_before_flag_change'] = [df.B.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['A_value_before_change_flag'] = [df.A.loc[idx] for idx in val_indices]
df_out['B_value_before_change_flag'] = [df.B.loc[idx] for idx in val_indices]
df_out['length'] = [df.flag_sum.loc[idx] for idx in rising_edges]
df_out.index = rising_edges
我假设这需要适用于具有多个上升沿的情况,并且连续值和平均值会附加到输出列表中:
# the first step is to extract the rising and falling edges using diff(), identify sections and length
df['flag_diff'] = df.flag.diff().fillna(0)
df['flag_sections'] = (df.flag_diff != 0).cumsum()
df['flag_sum'] = df.flag.groupby(df.flag_sections).transform('sum')
# then you can get the relevant indices by checking for the rising edges
rising_edges = df.index[df.flag_diff==1.0]
val_indices = [i-1 for i in rising_edges]
avg_indices = [(i-2,i-1) for i in rising_edges]
# and finally iterate over the relevant sections
df_out = pd.DataFrame()
df_out['A_mean_before_flag_change'] = [df.A.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['B_mean_before_flag_change'] = [df.B.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['A_value_before_change_flag'] = [df.A.loc[idx] for idx in val_indices]
df_out['B_value_before_change_flag'] = [df.B.loc[idx] for idx in val_indices]
df_out['length'] = [df.flag_sum.loc[idx] for idx in rising_edges]
df_out.index = rising_edges
我尝试创建更通用的解决方案:
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,1,0,1]})
print (df)
A B flag
0 1 1 0
1 3 3 0
2 4 4 0
3 7 6 0
4 8 8 1
5 11 11 1
6 1 1 1
7 15 19 0
8 20 20 0
9 15 15 1
10 16 16 0
11 87 87 1
首先使用标志的下一个1值按掩码为0创建组:
然后筛选出大小小于N的组:
筛选最后N行:
最后加上平均值:
我尝试创建更通用的解决方案:
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,1,0,1]})
print (df)
A B flag
0 1 1 0
1 3 3 0
2 4 4 0
3 7 6 0
4 8 8 1
5 11 11 1
6 1 1 1
7 15 19 0
8 20 20 0
9 15 15 1
10 16 16 0
11 87 87 1
首先使用标志的下一个1值按掩码为0创建组:
然后筛选出大小小于N的组:
筛选最后N行:
最后加上平均值:
每次出现上升沿时,结果应该在dataframe中有一行。在这种情况下,只需将列表响应添加到dataframe。如果您需要上升沿的索引,以便能够轻松地将结果与正确的边缘关联,则需要将其复制到生成的数据帧中,也请参见编辑。其他事情现在似乎很好,有一个问题,我还需要记录每个上升沿的长度,您可以添加这一行?编辑以包括上升沿后的节的长度每次上升沿出现时结果应在dataframe中有一行在这种情况下,只需将列表响应添加到dataframe。如果您需要上升沿的索引,以便能够轻松地将结果与正确的边缘关联,则需要将其复制到生成的数据帧中,也请参见编辑。其他事情现在似乎很好,有一个问题,我还需要记录每个上升沿的长度,您可以添加此项?编辑以包括上升边后的截面长度此解决方案适用于之前2行的shift mask,如果需要之前8行的shift mask,则需要更改哪个部分?@Edward-不幸的是,所有解决方案。请给我一些时间。@Edward-如果重叠的组没有问题吗?就像我的答案中的样本数据N=8,那么最后一组是重叠的。或者在实际数据中是不可能的?在理想情况下,重叠是不可能的allowed@Edward-所以需要理想数据的解决方案,这意味着没有组重叠?此解决方案适用于之前2行的移位掩码,如果需要之前8行的移位掩码,哪部分需要更改?@Edward-不幸的是,所有解决方案。请给我一些时间。@Edward-如果重叠的组没有问题吗?就像我的答案中的样本数据N=8,那么最后一组是重叠的。或者在实际数据中是不可能的?在理想情况下,重叠是不可能的allowed@Edward-所以需要理想数据的解决方案,这意味着没有组重叠?
df2 = df1.groupby('g').tail(N)
d = {'mean':'_mean_before_flag_change', 'last': '_value_before_change_flag'}
df3 = df2.groupby('g')['A','B'].agg(['mean','last']).sort_index(axis=1, level=1).rename(columns=d)
df3.columns = df3.columns.map(''.join)
print (df3)
A_value_before_change_flag B_value_before_change_flag \
g
2 20 20
3 7 6
A_mean_before_flag_change B_mean_before_flag_change
g
2 11.75 12.75
3 3.75 3.50