Pandas 在数据帧中的间隔日期填充缺失的观测值
假设我有以下数据帧:Pandas 在数据帧中的间隔日期填充缺失的观测值,pandas,datetime,missing-data,Pandas,Datetime,Missing Data,假设我有以下数据帧: +---------------------+---------+-------+-----+ | observed_cats_count | year | month | day | +---------------------+---------+-------+-----+ | 2 | 2019 | 10 | 19 | | 3 | 2019 | 10 | 18 |
+---------------------+---------+-------+-----+
| observed_cats_count | year | month | day |
+---------------------+---------+-------+-----+
| 2 | 2019 | 10 | 19 |
| 3 | 2019 | 10 | 18 |
| 5 | 2019 | 10 | 16 |
+---------------------+---------+-------+-----+
+---------------------+---------+-------+-----+
| observed_cats_count | year | month | day |
+---------------------+---------+-------+-----+
| 0 | 2019 | 10 | 20 |
| 2 | 2019 | 10 | 19 |
| 3 | 2019 | 10 | 18 |
| 0 | 2019 | 10 | 17 |
| 5 | 2019 | 10 | 16 |
| 0 | 2019 | 10 | 15 |
+---------------------+---------+-------+-----+
还有两个边界日期,比如说2019-10-15
和2019-10-20
,我知道所有缺失的观测值都应该有观测值\u cats\u count=0
如何为间隔中所有缺失的日期插入一行并获取以下数据帧:
+---------------------+---------+-------+-----+
| observed_cats_count | year | month | day |
+---------------------+---------+-------+-----+
| 2 | 2019 | 10 | 19 |
| 3 | 2019 | 10 | 18 |
| 5 | 2019 | 10 | 16 |
+---------------------+---------+-------+-----+
+---------------------+---------+-------+-----+
| observed_cats_count | year | month | day |
+---------------------+---------+-------+-----+
| 0 | 2019 | 10 | 20 |
| 2 | 2019 | 10 | 19 |
| 3 | 2019 | 10 | 18 |
| 0 | 2019 | 10 | 17 |
| 5 | 2019 | 10 | 16 |
| 0 | 2019 | 10 | 15 |
+---------------------+---------+-------+-----+
想法是使用创建
DatetimeIndex
,以便所有创建的datetimes都可以使用,然后从DatetimeIndex
创建列,按排序,最后使用drop=True
删除它:
rng = pd.date_range('2019-10-15','2019-10-20')
df = (df.set_index(pd.to_datetime(df[['year','month','day']]))['observed_cats_count']
.reindex(rng, fill_value=0).to_frame()
.assign(year=lambda x: x.index.year,
month=lambda x: x.index.month,
day=lambda x: x.index.day)
.sort_index(ascending=False)
.reset_index(drop=True))
print (df)
observed_cats_count year month day
0 0 2019 10 20
1 2 2019 10 19
2 3 2019 10 18
3 0 2019 10 17
4 5 2019 10 16
5 0 2019 10 15
想法是使用创建
DatetimeIndex
,以便所有创建的datetimes都可以使用,然后从DatetimeIndex
创建列,按排序,最后使用drop=True
删除它:
rng = pd.date_range('2019-10-15','2019-10-20')
df = (df.set_index(pd.to_datetime(df[['year','month','day']]))['observed_cats_count']
.reindex(rng, fill_value=0).to_frame()
.assign(year=lambda x: x.index.year,
month=lambda x: x.index.month,
day=lambda x: x.index.day)
.sort_index(ascending=False)
.reset_index(drop=True))
print (df)
observed_cats_count year month day
0 0 2019 10 20
1 2 2019 10 19
2 3 2019 10 18
3 0 2019 10 17
4 5 2019 10 16
5 0 2019 10 15
我将使用
pd.date\u range
构建一个新的数据帧,并合并回df
和fillna
dates = pd.date_range('2019-10-20', '2019-10-15', freq='-1D')
df1 = pd.DataFrame({'year': dates.year, 'month': dates.month, 'day': dates.day})
df2 = df1.merge(df, how='left').fillna(0)
Out[413]:
year month day observed_cats_count
0 2019 10 20 0.0
1 2019 10 19 2.0
2 2019 10 18 3.0
3 2019 10 17 0.0
4 2019 10 16 5.0
5 2019 10 15 0.0
我将使用
pd.date\u range
构建一个新的数据帧,并合并回df
和fillna
dates = pd.date_range('2019-10-20', '2019-10-15', freq='-1D')
df1 = pd.DataFrame({'year': dates.year, 'month': dates.month, 'day': dates.day})
df2 = df1.merge(df, how='left').fillna(0)
Out[413]:
year month day observed_cats_count
0 2019 10 20 0.0
1 2019 10 19 2.0
2 2019 10 18 3.0
3 2019 10 17 0.0
4 2019 10 16 5.0
5 2019 10 15 0.0