Python 根据Dataframe中的数据计算平均消耗量_Python_Pandas

Python 根据Dataframe中的数据计算平均消耗量

python pandas

Python 根据Dataframe中的数据计算平均消耗量,python,pandas,Python,Pandas,我有一个数据帧，我需要计算每个引擎的平均消耗量 iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51', '2021-03-04 12:50:24', '2021-03-04 13:02:02', '2021-03-04 14:37:23']] control_id = [1, 2, 3, 4, 5] index = pd.Mul

我有一个数据帧，我需要计算每个引擎的平均消耗量

    iterables = [['A123B'], ['2021-03-04 10:10:17', '2021-03-04 11:18:51', '2021-03-04 12:50:24', 
                             '2021-03-04 13:02:02', '2021-03-04 14:37:23']]
    control_id = [1, 2, 3, 4, 5]
    index = pd.MultiIndex.from_product(iterables, names=["ENGINE_ID", "TIME"])
    steps = [354815, 355160, 355428, 357850, 358314]
    quantity = [156.32, 85.49, 100.00, 157.02, 134.00]
    full = [1, 0, 0, 1, 0]
    dict = {'CONTROL_ID':control_id, 'STEPS':steps, 'QUANTITY':quantity, 'FULL':full}
    df = pd.DataFrame(dict, index=index)

引擎ID 时间控制ID 台阶量满满的 A123B 2021-03-04 10:10:17 1. 354815 156.32 1. 2021-03-04 11:18:51 2. 355160 85.49 0 2021-03-04 12:50:24 3. 355428 100 0 2021-03-04 13:02:02 4. 357850 157.02 1. 2021-03-04 14:37:23 5. 358314 134 0

首先，获取数量的累积和，然后仅定位引擎已满的行（full==1）

使用增量的numpy数组除法计算每一步的消耗量（因此使用1个索引的移位进行减去）

现在，分配结果。因为消费列表短了一个元素，所以这里的第一个元素设置为0

dffull["consumption"]=[0]+list(consumption)

这就是dffull的相似之处：

                               CONTROL_ID   STEPS  ...     cum  consumption
ENGINE_ID TIME                                     ...                     
A123B     2021-03-04 10:10:17           1  354815  ...  156.32     0.000000
          2021-03-04 13:02:02           4  357850  ...  498.83     8.861055

最后，在df中创建一个列消耗，初始化为0，然后分配计算出的值（您得到一个警告，可以忽略），然后完成

df["consumption"]=0
df["consumption"][df.FULL==1]=dffull.consumption

不确定这是否是最好的解决方案，但我会使用一系列的

shift

操作，如下所示：

import numpy as np

df['QUANT'] = df['QUANTITY'].shift(-1) # Shift QUANTITY by 1
df['GROUP'] = df['FULL'].cumsum() # Get a group number which increments when a 1 occurs in the FULL column

df2 = df.drop_duplicates(subset=['GROUP'], keep='first') # Create a new dataframe dropping and keeping the first
df2['NEXT_STEPS'] = df2['STEPS'].shift(-1) # Shift the STEPS column by 1
df2['DIFF'] = df2['NEXT_STEPS'] - df2['STEPS'] # Get the difference between the previous and next steps which is 357850 - 354815
df = pd.merge(df.reset_index(), df2[['DIFF', 'GROUP']], on='GROUP') # Merge it with the original df


df = pd.merge(df, df.groupby('GROUP')['QUANT'].sum().reset_index(), on='GROUP') # Get the QUANTITY sum for each group and merge with original df
df['AVERAGE'] = (df['DIFF']/df['QUANT_y']).shift(1) # Calculate the AVERAGE
df['AVERAGE'] = np.where(df['FULL']==1, df.AVERAGE, 0) # Replace AVERAGE column with 0 where FULL is not 1 else keep it
df['AVERAGE'] = df['AVERAGE'].fillna(0) # Replace any nan with 0
df = df[['ENGINE_ID', 'TIME', 'CONTROL_ID', 'STEPS', 'QUANTITY', 'FULL', 'AVERAGE']]

为了更好地了解发生了什么，我建议您将其分解并打印出结果。

让我们尝试以下方法：

import numpy as np

df['QUANT'] = df['QUANTITY'].shift(-1) # Shift QUANTITY by 1
df['GROUP'] = df['FULL'].cumsum() # Get a group number which increments when a 1 occurs in the FULL column

df2 = df.drop_duplicates(subset=['GROUP'], keep='first') # Create a new dataframe dropping and keeping the first
df2['NEXT_STEPS'] = df2['STEPS'].shift(-1) # Shift the STEPS column by 1
df2['DIFF'] = df2['NEXT_STEPS'] - df2['STEPS'] # Get the difference between the previous and next steps which is 357850 - 354815
df = pd.merge(df.reset_index(), df2[['DIFF', 'GROUP']], on='GROUP') # Merge it with the original df


df = pd.merge(df, df.groupby('GROUP')['QUANT'].sum().reset_index(), on='GROUP') # Get the QUANTITY sum for each group and merge with original df
df['AVERAGE'] = (df['DIFF']/df['QUANT_y']).shift(1) # Calculate the AVERAGE
df['AVERAGE'] = np.where(df['FULL']==1, df.AVERAGE, 0) # Replace AVERAGE column with 0 where FULL is not 1 else keep it
df['AVERAGE'] = df['AVERAGE'].fillna(0) # Replace any nan with 0
df = df[['ENGINE_ID', 'TIME', 'CONTROL_ID', 'STEPS', 'QUANTITY', 'FULL', 'AVERAGE']]

将熊猫作为pd导入
将numpy作为np导入
iterables=['A123B']，['2021-03-04 10:10:17'，'2021-03-04 11:18:51',，
'2021-03-04 12:50:24', '2021-03-04 13:02:02',
'2021-03-04 14:37:23']]
控件id=[1,2,3,4,5]
index=pd.MultiIndex.from_产品（iterables，name=[“引擎ID”，“时间”]）
步骤=[354815、355160、355428357850、358314]
数量=[156.32,85.49,100.00,157.02,134.00]
完整=[1,0,0,1,0]
d={'CONTROL\u ID'：CONTROL\u ID，'STEPS'：STEPS，'QUANTITY'：QUANTITY，'FULL'：FULL}
df=pd.DataFrame（d，index=index）
#其中FULL==1的布尔索引
满m=df.full.eq（1）
#获取完整数据之间每个组的平均值所需的值
总和=df.assign(
#此行与上一个完整行之间的差异==1行
STEP_DIFF=df.loc[full_m，'STEPS']-df.loc[full_m，'STEPS'].shift（）
).群比(
#创建组，该组以完整==1后的行开始，以下一个完整==1结束
df.FULL.shift（）.cumsum（）.fillna（0）
)[['STEP_DIFF'，'QUANTITY']]转换（'sum'）
#放在平均值或0中
df['AVERAGE']=np.式中（完整m，sums.STEP_DIFF/sums.QUANTITY，0）
#展示
打印（df.to_string（））

输出：

CONTROL_ID STEPS QUANTITY FULL AVERAGE ENGINE_ID TIME A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000 控制ID步骤数量完全平均值引擎ID时间 A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000 CONTROL_ID STEPS QUANTITY FULL AVERAGE ENGINE_ID TIME A123B 2021-03-04 10:10:17 1 354815 156.32 1 0.000000 2021-03-04 11:18:51 2 355160 85.49 0 0.000000 2021-03-04 12:50:24 3 355428 100.00 0 0.000000 2021-03-04 13:02:02 4 357850 157.02 1 8.861055 2021-03-04 14:37:23 5 358314 134.00 0 0.000000