Python 计算时间序列数据中最后3条记录的平均值（单位：秒）_Python_Pandas_Numpy

Python 计算时间序列数据中最后3条记录的平均值（单位：秒）

python pandas numpy

Python 计算时间序列数据中最后3条记录的平均值（单位：秒）,python,pandas,numpy,Python,Pandas,Numpy,在计算timeseries数据帧中最后3条记录的平均值时遇到问题。下面是数据样本 serial,date,feature1,,,,,,,,,,,,,,,,, 1,5/19/2017,-5.199338,,,,,,,,,,,,,,,,, 5,6/12/2017,-25.199338,,,,,,,,,,,,,,,,, 5,6/23/2017,5.199338,,,,,,,,,,,,,,,,, 2,7/1/2017,8.199338,,,,,,,,,,,,,,,,, 1,7/17/2017,3.199

在计算timeseries数据帧中最后3条记录的平均值时遇到问题。下面是数据样本

serial,date,feature1,,,,,,,,,,,,,,,,,
1,5/19/2017,-5.199338,,,,,,,,,,,,,,,,,
5,6/12/2017,-25.199338,,,,,,,,,,,,,,,,,
5,6/23/2017,5.199338,,,,,,,,,,,,,,,,,
2,7/1/2017,8.199338,,,,,,,,,,,,,,,,,
1,7/17/2017,3.199338,,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,,,,,,,,,,,,,,,,,
2,8/19/2017,13.199338,,,,,,,,,,,,,,,,,
6,9/19/2017,785.199338,,,,,,,,,,,,,,,,,
3,10/28/2017,5.199338,,,,,,,,,,,,,,,,,
4,11/2/2017,67.199338,,,,,,,,,,,,,,,,,
2,11/28/2017,49.199338,,,,,,,,,,,,,,,,,
2,12/29/2017,20.199338,,,,,,,,,,,,,,,,,
3,1/29/2018,19.199338,,,,,,,,,,,,,,,,,
4,3/13/2018,-15.199338,,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,

要求在数据框中添加另一列，如

mean

，这将是最后3行具有类似

序列号的平均值（对于feature1
）。每行都必须这样做
例如，计算以下行的平均值
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,

将使用下面的数据集完成-
1,7/17/2017,3.199338,,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,,,,,,,,,,,,,,,,,

在计算平均值后，行将像lo0k一样
serial,date,feature1,mean_feature1,,,,,,,,,,,,,,,,,
...........................
1,3/28/2018,-5.199338,24.7333,,,,,,,,,,,,,,,,

我的问题陈述类似于下面的文章，但它是使用滚动，这需要明确的窗口，在我的情况下是随机的-

预期产量-
serial,date,feature1,mean_feature1,,,,,,,,,,,,,,,,
1,5/19/2017,-5.199338,-5.199338,,,,,,,,,,,,,,,,
5,6/12/2017,-25.199338,-25.199338,,,,,,,,,,,,,,,,
5,6/23/2017,5.199338,-10.0,,,,,,,,,,,,,,,,
2,7/1/2017,8.199338,8.199338,,,,,,,,,,,,,,,,
1,7/17/2017,3.199338,-1,,,,,,,,,,,,,,,,
1,7/29/2017,76.199338,24.xxx,,,,,,,,,,,,,,,,
2,8/19/2017,13.199338,10.7xx,,,,,,,,,,,,,,,,
6,9/19/2017,785.199338,785.199338,,,,,,,,,,,,,,,,
3,10/28/2017,5.199338,5.199338,,,,,,,,,,,,,,,,
4,11/2/2017,67.199338,67.199338,,,,,,,,,,,,,,,,
2,11/28/2017,49.199338,23.xxx,,,,,,,,,,,,,,,,
2,12/29/2017,20.199338,27.xx,,,,,,,,,,,,,,,,
3,1/29/2018,19.199338,12.xxx,,,,,,,,,,,,,,,,
4,3/13/2018,-15.199338,26.xxxx,,,,,,,,,,,,,,,,
1,3/28/2018,-5.199338,24.xxxxx,,,,,,,,,,,,,,,,

请注意，这些值是为“平均值特征1”列计算的近似值。
您需要使用和平均值
：
#if necessary remove only NaNs columns
df = df.dropna(how='all', axis=1)
df['mean_feature1'] = (df.groupby('serial',sort=False)['feature1']
                        .rolling(3, min_periods=1).mean()
                        .reset_index(drop=True))
print (df)

    serial        date    feature1  mean_feature1
0        1   5/19/2017   -5.199338      -5.199338
1        5   6/12/2017  -25.199338     -25.199338
2        5   6/23/2017    5.199338     -10.000000
3        2    7/1/2017    8.199338       8.199338
4        1   7/17/2017    3.199338      -1.000000
5        1   7/29/2017   76.199338      24.733113
6        2   8/19/2017   13.199338      10.699338
7        6   9/19/2017  785.199338     785.199338
8        3  10/28/2017    5.199338       5.199338
9        4   11/2/2017   67.199338      67.199338
10       2  11/28/2017   49.199338      23.532671
11       2  12/29/2017   20.199338      27.532671
12       3   1/29/2018   19.199338      12.199338
13       4   3/13/2018  -15.199338      26.000000
14       1   3/28/2018   -5.199338      24.733113

如果要按位置排列，请执行以下操作：
df.insert(3, 'mean_feature1', (df.groupby('serial',sort=False)['feature1']
                                 .rolling(3, min_periods=1).mean()
                                 .reset_index(drop=True)))

如何确定“相似序列号”@Cut7er，相似序列号表示列（数据集中的第一列）中的值。我认为在应用（）之前，您需要按groupby之后的日期进行排序。我想要的是在我的原始数据框中为所有记录添加一列。@JagrutTrivedi-be needdf['mean_feature1']=df.groupby（'serial'，sort=False）['feature1'].transform（lambda x:x.tail（3）.mean（））
？预期的输出是什么？@JagrutTrivedi-或者需要df['mean\u feature1']=df.groupby（'serial'，sort=False）['feature1'].rolling（3，min\u periods=1.mean（）。重置索引（level=0，drop=True）
？