Python 在数据框中填充缺少的日期
我有一个数据框,看起来像这样 它有8列和n行。第一列是缺少天数的日期。(如1946-01-04 etd…)但也有重复项(如1946-01-02)。我希望有一个代码可以保存这些重复项,但也可以填充缺少的日期,并将Python 在数据框中填充缺少的日期,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框,看起来像这样 它有8列和n行。第一列是缺少天数的日期。(如1946-01-04 etd…)但也有重复项(如1946-01-02)。我希望有一个代码可以保存这些重复项,但也可以填充缺少的日期,并将NaN添加到行中的其他单元格中 我试过这个 dfx=pd.DataFrame(无,index=pd.DatetimeIndex(start=df)。地震の発生日時.min(),end=df。地震の発生日時.max(),freq='D')) df=df.apply(pd.concat([d
NaN
添加到行中的其他单元格中
我试过这个
dfx=pd.DataFrame(无,index=pd.DatetimeIndex(start=df)。地震の発生日時.min(),end=df。地震の発生日時.max(),freq='D'))
df=df.apply(pd.concat([df,dfx],join='outer',axis=1))
但它只是从文件末尾的.min()
添加到.max()
。。。我想把它应用到数据中,比如
Date Time Places w x y z
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-03 04:36:00 6.5 35.5 139.5 50 3 1
1946-01-04 00:00:00 NaN NaN NaN NaN NaN NaN
1946-01-06 10:56:00 8.1 41.5 143.4 51 5.2 3
顺便说一句,我不能使用内部联接
。它抛出:
AttributeError:“Places”不是“Series”对象的有效函数
如果第一列填写的日期时间索引中没有时间,则解决方案:
print (df)
Time Places w x y z col
Date
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1
print (df.index)
DatetimeIndex(['1946-01-02', '1946-01-02', '1946-01-02', '1946-01-03',
'1946-01-05'],
dtype='datetime64[ns]', name='Date', freq=None)
使用以下内容创建新数据帧:
然后使用:
如果存在带有时间的
DatetimeIndex
,请通过以下方式创建列:
然后在
DateTime
列中用replace misng值删除times by和last:
d = df['DateTime'].dt.normalize()
dfx = pd.DataFrame({'Dates':pd.date_range(start=d.min(),
end=d.max(), freq='D')})
print (dfx)
Dates
0 1946-01-02
1 1946-01-03
2 1946-01-04
3 1946-01-05
df = dfx.merge(df.assign(Dates=d), on='Dates', how='left')
df['DateTime'] = df['DateTime'].fillna(df['Dates'])
print (df)
Dates DateTime Places w x y z col
0 1946-01-02 1946-01-02 14:45:00 6.8 36.3 140.1 31.0 3.2 1.0
1 1946-01-02 1946-01-02 22:18:00 7.6 40.5 141.4 0.0 4.6 3.0
2 1946-01-02 1946-01-02 23:29:00 6.7 36.1 139.4 39.0 4.3 2.0
3 1946-01-03 1946-01-03 04:28:00 5.6 34.4 136.5 1.0 4.2 2.0
4 1946-01-04 1946-01-04 00:00:00 NaN NaN NaN NaN NaN NaN
5 1946-01-05 1946-01-05 04:36:00 6.5 35.5 139.5 50.0 3.0 1.0
数据中只有索引中的第一列
Date
,DatetimeIndex
?
df = dfx.join(df)
print (df)
Time Places w x y z col
1946-01-02 14:45:00 6.8 36.3 140.1 31.0 3.2 1.0
1946-01-02 22:18:00 7.6 40.5 141.4 0.0 4.6 3.0
1946-01-02 23:29:00 6.7 36.1 139.4 39.0 4.3 2.0
1946-01-03 04:28:00 5.6 34.4 136.5 1.0 4.2 2.0
1946-01-04 NaN NaN NaN NaN NaN NaN NaN
1946-01-05 04:36:00 6.5 35.5 139.5 50.0 3.0 1.0
print (df)
Places w x y z col
DateTime
1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1
print (df.index)
DatetimeIndex(['1946-01-02 14:45:00', '1946-01-02 22:18:00',
'1946-01-02 23:29:00', '1946-01-03 04:28:00',
'1946-01-05 04:36:00'],
dtype='datetime64[ns]', name='DateTime', freq=None)
df = df.reset_index()
print (df)
DateTime Places w x y z col
0 1946-01-02 14:45:00 6.8 36.3 140.1 31 3.2 1
1 1946-01-02 22:18:00 7.6 40.5 141.4 0 4.6 3
2 1946-01-02 23:29:00 6.7 36.1 139.4 39 4.3 2
3 1946-01-03 04:28:00 5.6 34.4 136.5 1 4.2 2
4 1946-01-05 04:36:00 6.5 35.5 139.5 50 3.0 1
d = df['DateTime'].dt.normalize()
dfx = pd.DataFrame({'Dates':pd.date_range(start=d.min(),
end=d.max(), freq='D')})
print (dfx)
Dates
0 1946-01-02
1 1946-01-03
2 1946-01-04
3 1946-01-05
df = dfx.merge(df.assign(Dates=d), on='Dates', how='left')
df['DateTime'] = df['DateTime'].fillna(df['Dates'])
print (df)
Dates DateTime Places w x y z col
0 1946-01-02 1946-01-02 14:45:00 6.8 36.3 140.1 31.0 3.2 1.0
1 1946-01-02 1946-01-02 22:18:00 7.6 40.5 141.4 0.0 4.6 3.0
2 1946-01-02 1946-01-02 23:29:00 6.7 36.1 139.4 39.0 4.3 2.0
3 1946-01-03 1946-01-03 04:28:00 5.6 34.4 136.5 1.0 4.2 2.0
4 1946-01-04 1946-01-04 00:00:00 NaN NaN NaN NaN NaN NaN
5 1946-01-05 1946-01-05 04:36:00 6.5 35.5 139.5 50.0 3.0 1.0