Python 按日期拉伸数据帧

Python 按日期拉伸数据帧,python,pandas,numpy,Python,Pandas,Numpy,我有这个数据框: ID date X1 X2 Y A 16-07-19 58 50 0 A 21-07-19 28 74 0 B 25-07-19 54 65 1 B 27-07-19 50 30 0 B 29-07-19 81 61 0 C 30-07-19 55 29 0 C 31-07-19 97 69 1 C 03-08-19 13 48 1 D 19-07-18 77 27 1

我有这个数据框:

ID      date  X1  X2  Y
 A  16-07-19  58  50  0
 A  21-07-19  28  74  0
 B  25-07-19  54  65  1
 B  27-07-19  50  30  0
 B  29-07-19  81  61  0
 C  30-07-19  55  29  0
 C  31-07-19  97  69  1
 C  03-08-19  13  48  1
 D  19-07-18  77  27  1
 D  20-07-18  68  50  1
 D  22-07-18  89  57  1
 D  23-07-18  46  70  0
 D  26-07-18  56  13  0
 E  06-08-19  47  35  1
我想通过
date
“拉伸”数据,从每个ID的第一行到最后一行
groupby
), 并用
NaN
填充缺少的值

例如:ID
A
16-07-19
21-07-19
上有两行。 实施后,他应在2019年7月16日至21日拥有6排

预期结果:

ID      date    X1    X2    Y
 A  16-07-19  58.0  50.0  0.0
 A  17-07-19   NaN   NaN  NaN
 A  18-07-19   NaN   NaN  NaN
 A  19-07-19   NaN   NaN  NaN
 A  20-07-19   NaN   NaN  NaN
 A  21-07-19  28.0  74.0  0.0
 B  25-07-19  54.0  65.0  1.0
 B  26-07-19   NaN   NaN  NaN
 B  27-07-19  50.0  30.0  0.0
 B  28-07-19   NaN   NaN  NaN
 B  29-07-19  81.0  61.0  0.0
 C  30-07-19  55.0  29.0  0.0
 C  31-07-19  97.0  69.0  1.0
 C  01-08-19   NaN   NaN  NaN
 C  02-08-19   NaN   NaN  NaN
 C  03-08-19  13.0  48.0  1.0
 D  19-07-18  77.0  27.0  1.0
 D  20-07-18  68.0  50.0  1.0
 D  21-07-18   NaN   NaN  NaN
 D  22-07-18  89.0  57.0  1.0
 D  23-07-18  46.0  70.0  0.0
 D  24-07-18   NaN   NaN  NaN
 D  25-07-18   NaN   NaN  NaN
 D  26-07-18  56.0  13.0  0.0
 E  06-08-19  47.0  35.0  1.0
按使用
DatetimeIndex
的组使用:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

cols = df.columns.difference(['date','ID'], sort=False)
df = df.set_index('date').groupby('ID')[cols].apply(lambda x: x.asfreq('d')).reset_index()
print (df)
   ID       date    X1    X2    Y
0   A 2019-07-16  58.0  50.0  0.0
1   A 2019-07-17   NaN   NaN  NaN
2   A 2019-07-18   NaN   NaN  NaN
3   A 2019-07-19   NaN   NaN  NaN
4   A 2019-07-20   NaN   NaN  NaN
5   A 2019-07-21  28.0  74.0  0.0
6   B 2019-07-25  54.0  65.0  1.0
7   B 2019-07-26   NaN   NaN  NaN
8   B 2019-07-27  50.0  30.0  0.0
9   B 2019-07-28   NaN   NaN  NaN
10  B 2019-07-29  81.0  61.0  0.0
11  C 2019-07-30  55.0  29.0  0.0
12  C 2019-07-31  97.0  69.0  1.0
13  C 2019-08-01   NaN   NaN  NaN
14  C 2019-08-02   NaN   NaN  NaN
15  C 2019-08-03  13.0  48.0  1.0
16  D 2018-07-19  77.0  27.0  1.0
17  D 2018-07-20  68.0  50.0  1.0
18  D 2018-07-21   NaN   NaN  NaN
19  D 2018-07-22  89.0  57.0  1.0
20  D 2018-07-23  46.0  70.0  0.0
21  D 2018-07-24   NaN   NaN  NaN
22  D 2018-07-25   NaN   NaN  NaN
23  D 2018-07-26  56.0  13.0  0.0
24  E 2019-08-06  47.0  35.0  1.0
关于per组的另一个想法:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

cols = df.columns.difference(['date','ID'], sort=False)
f = lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max()))
df = df.set_index('date').groupby('ID')[cols].apply(f).reset_index()
这是我喜欢的柔术:

def Sort_by_date(dataf):
    # rule1
    dataf['Current'] = pd.to_datetime(dataf.Current)
    dataf = dataf.sort_values(by=['Current'],ascending=True)
    # rule2
    dataf['Current'] = pd.to_datetime(dataf.Current)
    Mask = (dataf['Current'] > '1/1/2020') & (dataf['Current'] <= '12/31/2022')
    dataf = dataf.loc[Mask]
    return dataf
最后,让我们替换NaN

Week1_Report['X1'.fillna("X1 is 0", inplace = True)
Week1_Report['X2'.fillna("X2 is 0", inplace = True) 
Week1_Report['Y'.fillna("Y is 0", inplace = True)

您还可以使用
df.set_索引([“ID”,“date”]).reindex([(名称,时间)表示名称,df.groupby([“ID”])中的x表示pd.date范围内的时间(最小(x),最大(x)))
Week1_Report['X1'.fillna("X1 is 0", inplace = True)
Week1_Report['X2'.fillna("X2 is 0", inplace = True) 
Week1_Report['Y'.fillna("Y is 0", inplace = True)