Python 为数据帧中的每个时间序列在第一次出现之前和最后一次出现之后切片NaN值_Python_Pandas_Dataframe_Time Series_Slice

Python 为数据帧中的每个时间序列在第一次出现之前和最后一次出现之后切片NaN值

python pandas dataframe

Python 为数据帧中的每个时间序列在第一次出现之前和最后一次出现之后切片NaN值,python,pandas,dataframe,time-series,slice,Python,Pandas,Dataframe,Time Series,Slice,我用Python 3和Pandas处理时间序列。我有一个包含多个时间序列的数据框（在本例中为两个），每个时间序列包含一个商店的销售数据。数据帧看起来像： index Shop Quantity index Date 0 2017-01-08 0 1 NaN 1 2017-01-15 1 1 NaN 2 2017-01

我用Python 3和Pandas处理时间序列。我有一个包含多个时间序列的数据框（在本例中为两个），每个时间序列包含一个商店的销售数据。数据帧看起来像：

                  index  Shop  Quantity
index Date                             
0     2017-01-08      0     1       NaN
1     2017-01-15      1     1       NaN
2     2017-01-22      2     1      34.0
3     2017-01-29      3     1      54.0
4     2017-02-05      4     1      42.0
5     2017-02-12      5     1       NaN
6     2017-01-08      6     2       NaN
7     2017-01-15      7     2      29.0
8     2017-01-22      8     2       NaN
9     2017-01-29      9     2      58.0
10    2017-02-05     10     2      49.0
11    2017-02-12     11     2       NaN

对于每个时间序列，我希望在第一次出现之前删除NaN，在最后一次出现之后删除NaN。它应该类似于：

                  index  Shop  Quantity
index Date                             
2     2017-01-22      2     1      34.0
3     2017-01-29      3     1      54.0
4     2017-02-05      4     1      42.0
7     2017-01-15      7     2      29.0
8     2017-01-22      8     2       NaN
9     2017-01-29      9     2      58.0
10    2017-02-05     10     2      49.0

但是，以下代码在第一次出现之前和最后一次出现之后删除NAN，但不删除索引为5和6的行：

df = df.loc[df['Quantity'].first_valid_index():df['Quantity'].last_valid_index()]

有没有办法解决这个问题？感谢您的帮助。

使用：

l = df.index[~(df['Date']>df['Date'].shift())].to_list()
l.append(len(df))
l_mod = [0] + l + [max(l)+1]
list_of_dfs = [df.iloc[l_mod[n]:l_mod[n+1]] for n in range(len(l_mod)-1)]

df_new=pd.DataFrame(columns=df.columns)
for d in list_of_dfs:
    df_new = df_new.append(d.loc[d['Quantity'].first_valid_index():d['Quantity'].last_valid_index()])
df_new

         Date index.1 Shop  Quantity
2  2017-01-22       2    1      34.0
3  2017-01-29       3    1      54.0
4  2017-02-05       4    1      42.0
7  2017-01-15       7    2      29.0
8  2017-01-22       8    2       NaN
9  2017-01-29       9    2      58.0
10 2017-02-05      10    2      49.0

让我们使用

groupby

和

first\u valid\u index

和

last\u valid\u index

进行索引切片，使用

loc

：

df.groupby('Shop', group_keys=False)\
  .apply(lambda x: x.loc[x['Quantity'].first_valid_index():x['Quantity'].last_valid_index()])

输出：

                  ind  Shop  Quantity
index Date                           
2     2017-01-22    2     1      34.0
3     2017-01-29    3     1      54.0
4     2017-02-05    4     1      42.0
7     2017-01-15    7     2      29.0
8     2017-01-22    8     2       NaN
9     2017-01-29    9     2      58.0
10    2017-02-05   10     2      49.0

                  ind  Shop  Quantity
index Date                           
2     2017-01-22    2     1      34.0
3     2017-01-29    3     1      54.0
4     2017-02-05    4     1      42.0
7     2017-01-15    7     2      29.0
8     2017-01-22    8     2       NaN
9     2017-01-29    9     2      58.0
10    2017-02-05   10     2      49.0