Python 将一个系列替换为多索引中另一个长度不同的系列_Python_Pandas

Python 将一个系列替换为多索引中另一个长度不同的系列

python pandas

Python 将一个系列替换为多索引中另一个长度不同的系列,python,pandas,Python,Pandas,我有一个在多重索引中的系列，我想改变它假设我有以下系列，名为ser： gbd_wijk_naam gbd_buurt_naam cluster_id weging_datum_weging Centrale Markt Ecowijk 119617.877|488566.830 2017-05-07 20.248457

我有一个在多重索引中的系列，我想改变它

假设我有以下系列，名为

ser

：

gbd_wijk_naam       gbd_buurt_naam    cluster_id             weging_datum_weging
Centrale Markt      Ecowijk           119617.877|488566.830  2017-05-07             20.248457
                                                             2017-05-21             23.558438
                                                             2017-05-28             40.910273
                                                             2017-06-18             14.142136
                                                             2017-07-09             15.652476
                                                                                      ...    
Westindische Buurt  Postjeskade e.o.  118620.633|486116.648  2019-11-17             17.029386
                                                             2019-12-01             21.530015
                                                             2019-12-08             15.491933
                                                             2019-12-15             22.896061
                                                             2019-12-22             13.228757

最后，我想对所有索引都这样做，但现在让我们只关注一个

我采用第一个索引，所以（Centrale Markt，Ecowijk，119617.877 | 488566.830）。这将返回到以下系列：

weging_datum_weging
2017-05-07    20.248457
2017-05-21    23.558438
2017-05-28    40.910273
2017-06-18    14.142136
2017-07-09    15.652476
2017-07-23    44.067607
2017-07-30    17.464249
2017-08-20    20.000000
2017-08-27    30.184594
2017-09-03    19.104973
2017-09-10    17.175564
2017-09-17    15.968719
2017-09-24    38.415531
2017-10-29    18.708287
2017-11-05    18.574176
2017-11-12    21.095023
2017-12-10    21.794495
2019-01-06    42.966652
2019-01-20    13.038405
2019-01-27    29.483345
2019-02-17    16.278821
2019-02-24    15.968719
2019-03-03    31.583124
2019-03-10    19.748418
2019-04-28    18.574176
2019-05-12    17.029386
2019-05-19    20.976177
2019-06-23    20.493902
2019-07-14    15.329710
2019-09-22    34.537485
2019-09-29    17.320508
2019-10-06    16.431677
2019-10-27    10.246951
2019-11-17    16.733201
2019-11-24    29.567957
Name: weging_netto_gewicht, dtype: float64

形状

（35，）

我想用我生成的插值序列的值替换此索引中的所有值：

_ = ser.loc[('Centrale Markt', 'Ecowijk', '119617.877|488566.830')]
upsampled = _.resample('D')
interpolated = upsampled.interpolate(method='linear')

此系列具有形状

（932，）

我可以通过以下方式更改系列：

x = ser.loc[('Centrale Markt', 'Ecowijk', '119617.877|488566.830')]
x = x.reindex(interpolated.index)
x.update(interpolated)

给我

weging_datum_weging
2017-05-07    20.248457
2017-05-08    20.484884
2017-05-09    20.721311
2017-05-10    20.957738
2017-05-11    21.194166
                ...    
2019-11-20    22.233810
2019-11-21    24.067347
2019-11-22    25.900884
2019-11-23    27.734420
2019-11-24    29.567957
Freq: D, Name: weging_netto_gewicht, Length: 932, dtype: float64

我似乎不知道如何将

放回

ser

索引（'Centrale Markt'，'Ecowijk'，'119617.877 | 488566.830'））

当我尝试对所有索引执行此操作时，例如：

for idx, df_select in ser2.groupby(level=[0,1,2]):
    _ = ser.loc[idx]
    upsampled = _.resample('D')
    interpolated = upsampled.interpolate(method='linear')

    ser.loc[idx] = ser.loc[idx].reindex(interpolated.index)
    ser.loc[idx].update(interpolated)

Interpolated按其应该的方式生成，但第二部分没有更新

ser

我现在的工作方式如下：

for index, value in interpolated.items():
    new_df = new_df.append(
    {'gbd_wijk_naam': idx[0], \
    'gbd_buurt_naam': idx[1],\
    'cluster_id': idx[2],\
    'weging_datum_weging': index,\
    'weging_netto_gewicht': value}, ignore_index=True)

它将行附加到一个新的df，并且该df稍后再次以相同的方式分组。这是超慢的。我们如何加快这一速度？

当索引为

DatetimeIndex

、

TimedeltaIndex

或

PeriodIndex

时，重采样可以工作，但不能像您现在这样使用多索引

可以将时间戳列设置为索引，按其他列分组并重新采样/插值

使用以下数据进行说明：

gbd_wijk_naam       gbd_buurt_naam    cluster_id             weging_datum_weging
Centrale Markt      Ecowijk           119617.877|488566.830  2017-05-07             20.248457
                                                             2017-05-21             23.558438
                                                             2017-05-28             40.910273

为系列命名并重置索引

df = series.rename('val').reset_index()

确保datetime列的类型正确

df.weging_datum_weging = pd.to_datetime(df.wegin_datum_wegin)

设置索引、按其他列分组、重采样和插值

(df.set_index('weging_datum_weging')
   .groupby(['gbd_wijk_naam', 'gbd_buurt_naam', 'cluster_id'])
   .val.apply(lambda s: s.resample('D').interpolate('linear')))

生成输出：

gbd_wijk_naam   gbd_buurt_naam  cluster_id             weging_datum_weging
Centrale Markt  Ecowijk         119617.877|488566.830  2017-05-07             20.248457
                                                       2017-05-08             20.484884
                                                       2017-05-09             20.721311
                                                       2017-05-10             20.957739
                                                       2017-05-11             21.194166
                                                                                ...
                                                       2017-07-05             15.364792
                                                       2017-07-06             15.436713
                                                       2017-07-07             15.508634
                                                       2017-07-08             15.580555
                                                       2017-07-09             15.652476
Name: val, Length: 64, dtype: float64