Python 根据Dataframe中的数据计算平均消耗量

Python 根据Dataframe中的数据计算平均消耗量,python,pandas,Python,Pandas,我有一个数据帧,我需要计算每个引擎的平均消耗量 iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51', '2021-03-04 12:50:24', '2021-03-04 13:02:02', '2021-03-04 14:37:23']] control_id = [1, 2, 3, 4, 5] index = pd.Mul

我有一个数据帧,我需要计算每个引擎的平均消耗量

    iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51', '2021-03-04 12:50:24', 
                             '2021-03-04 13:02:02', '2021-03-04 14:37:23']]
    control_id = [1, 2, 3, 4, 5]
    index = pd.MultiIndex.from_product(iterables, names=["ENGINE_ID", "TIME"])
    steps = [354815, 355160, 355428, 357850, 358314]
    quantity = [156.32, 85.49, 100.00, 157.02, 134.00]
    full = [1, 0, 0, 1, 0]
    dict = {'CONTROL_ID':control_id, 'STEPS':steps, 'QUANTITY':quantity, 'FULL':full}
    df = pd.DataFrame(dict, index=index)
引擎ID 时间 控制ID 台阶 量 满满的 A123B 2021-03-04 10:10:17 1. 354815 156.32 1. 2021-03-04 11:18:51 2. 355160 85.49 0 2021-03-04 12:50:24 3. 355428 100 0 2021-03-04 13:02:02 4. 357850 157.02 1. 2021-03-04 14:37:23 5. 358314 134 0
首先,获取数量的累积和,然后仅定位引擎已满的行(full==1)

使用增量的numpy数组除法计算每一步的消耗量(因此使用1个索引的移位进行减去)

现在,分配结果。因为消费列表短了一个元素,所以这里的第一个元素设置为0

dffull["consumption"]=[0]+list(consumption)
这就是dffull的相似之处:

                               CONTROL_ID   STEPS  ...     cum  consumption
ENGINE_ID TIME                                     ...                     
A123B     2021-03-04 10:10:17           1  354815  ...  156.32     0.000000
          2021-03-04 13:02:02           4  357850  ...  498.83     8.861055
最后,在df中创建一个列消耗,初始化为0,然后分配计算出的值(您得到一个警告,可以忽略),然后完成

df["consumption"]=0
df["consumption"][df.FULL==1]=dffull.consumption

不确定这是否是最好的解决方案,但我会使用一系列的
shift
操作,如下所示:

import numpy as np

df['QUANT'] = df['QUANTITY'].shift(-1) # Shift QUANTITY by 1
df['GROUP'] = df['FULL'].cumsum() # Get a group number which increments when a 1 occurs in the FULL column

df2 = df.drop_duplicates(subset=['GROUP'], keep='first') # Create a new dataframe dropping and keeping the first
df2['NEXT_STEPS'] = df2['STEPS'].shift(-1) # Shift the STEPS column by 1
df2['DIFF'] = df2['NEXT_STEPS'] - df2['STEPS'] # Get the difference between the previous and next steps which is 357850 - 354815
df = pd.merge(df.reset_index(), df2[['DIFF', 'GROUP']], on='GROUP') # Merge it with the original df


df = pd.merge(df, df.groupby('GROUP')['QUANT'].sum().reset_index(), on='GROUP') # Get the QUANTITY sum for each group and merge with original df
df['AVERAGE'] = (df['DIFF']/df['QUANT_y']).shift(1) # Calculate the AVERAGE
df['AVERAGE'] = np.where(df['FULL']==1, df.AVERAGE, 0) # Replace AVERAGE column with 0 where FULL is not 1 else keep it
df['AVERAGE'] = df['AVERAGE'].fillna(0) # Replace any nan with 0
df = df[['ENGINE_ID', 'TIME', 'CONTROL_ID', 'STEPS', 'QUANTITY', 'FULL', 'AVERAGE']]

为了更好地了解发生了什么,我建议您将其分解并打印出结果。

让我们尝试以下方法:

import numpy as np

df['QUANT'] = df['QUANTITY'].shift(-1) # Shift QUANTITY by 1
df['GROUP'] = df['FULL'].cumsum() # Get a group number which increments when a 1 occurs in the FULL column

df2 = df.drop_duplicates(subset=['GROUP'], keep='first') # Create a new dataframe dropping and keeping the first
df2['NEXT_STEPS'] = df2['STEPS'].shift(-1) # Shift the STEPS column by 1
df2['DIFF'] = df2['NEXT_STEPS'] - df2['STEPS'] # Get the difference between the previous and next steps which is 357850 - 354815
df = pd.merge(df.reset_index(), df2[['DIFF', 'GROUP']], on='GROUP') # Merge it with the original df


df = pd.merge(df, df.groupby('GROUP')['QUANT'].sum().reset_index(), on='GROUP') # Get the QUANTITY sum for each group and merge with original df
df['AVERAGE'] = (df['DIFF']/df['QUANT_y']).shift(1) # Calculate the AVERAGE
df['AVERAGE'] = np.where(df['FULL']==1, df.AVERAGE, 0) # Replace AVERAGE column with 0 where FULL is not 1 else keep it
df['AVERAGE'] = df['AVERAGE'].fillna(0) # Replace any nan with 0
df = df[['ENGINE_ID', 'TIME', 'CONTROL_ID', 'STEPS', 'QUANTITY', 'FULL', 'AVERAGE']]
将熊猫作为pd导入
将numpy作为np导入
iterables=['A123B'],['2021-03-04 10:10:17','2021-03-04 11:18:51',,
'2021-03-04 12:50:24', '2021-03-04 13:02:02',
'2021-03-04 14:37:23']]
控件id=[1,2,3,4,5]
index=pd.MultiIndex.from_产品(iterables,name=[“引擎ID”,“时间”])
步骤=[354815、355160、355428357850、358314]
数量=[156.32,85.49,100.00,157.02,134.00]
完整=[1,0,0,1,0]
d={'CONTROL\u ID':CONTROL\u ID,'STEPS':STEPS,'QUANTITY':QUANTITY,'FULL':FULL}
df=pd.DataFrame(d,index=index)
#其中FULL==1的布尔索引
满m=df.full.eq(1)
#获取完整数据之间每个组的平均值所需的值
总和=df.assign(
#此行与上一个完整行之间的差异==1行
STEP_DIFF=df.loc[full_m,'STEPS']-df.loc[full_m,'STEPS'].shift()
).群比(
#创建组,该组以完整==1后的行开始,以下一个完整==1结束
df.FULL.shift().cumsum().fillna(0)
)[['STEP_DIFF','QUANTITY']]转换('sum')
#放在平均值或0中
df['AVERAGE']=np.式中(完整m,sums.STEP_DIFF/sums.QUANTITY,0)
#展示
打印(df.to_string())
输出:

CONTROL_ID STEPS QUANTITY FULL AVERAGE ENGINE_ID TIME A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000 控制ID步骤数量完全平均值 引擎ID时间 A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000 CONTROL_ID STEPS QUANTITY FULL AVERAGE ENGINE_ID TIME A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000