Python 在数据框中填充缺少的日期_Python_Pandas_Dataframe

Python 在数据框中填充缺少的日期

python pandas dataframe

Python 在数据框中填充缺少的日期,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据框，看起来像这样它有8列和n行。第一列是缺少天数的日期。（如1946-01-04 etd…）但也有重复项（如1946-01-02）。我希望有一个代码可以保存这些重复项，但也可以填充缺少的日期，并将NaN添加到行中的其他单元格中我试过这个 dfx=pd.DataFrame（无，index=pd.DatetimeIndex（start=df）。地震の発生日時.min（），end=df。地震の発生日時.max（），freq='D'）） df=df.apply（pd.concat（[d

我有一个数据框，看起来像这样

它有8列和n行。第一列是缺少天数的日期。（如1946-01-04 etd…）但也有重复项（如1946-01-02）。我希望有一个代码可以保存这些重复项，但也可以填充缺少的日期，并将

NaN

添加到行中的其他单元格中

我试过这个

dfx=pd.DataFrame（无，index=pd.DatetimeIndex（start=df）。地震の発生日時.min（），end=df。地震の発生日時.max（），freq='D'））
df=df.apply（pd.concat（[df，dfx]，join='outer'，axis=1））

但它只是从文件末尾的

.min（）

添加到

.max（）

。。。我想把它应用到数据中，比如

Date        Time        Places  w     x      y    z
1946-01-02  14:45:00    6.8   36.3  140.1   31  3.2 1
1946-01-02  22:18:00    7.6   40.5  141.4   0   4.6 3
1946-01-02  23:29:00    6.7   36.1  139.4   39  4.3 2
1946-01-03  04:28:00    5.6   34.4  136.5   1   4.2 2
1946-01-03  04:36:00    6.5   35.5  139.5   50  3   1
1946-01-04  00:00:00    NaN   NaN   NaN     NaN NaN NaN
1946-01-06  10:56:00    8.1   41.5  143.4   51  5.2 3

顺便说一句，我不能使用

内部联接

。它抛出：

AttributeError:“Places”不是“Series”对象的有效函数

如果第一列填写的日期时间索引中没有时间，则解决方案：

print (df)
                Time  Places     w      x   y    z  col
Date                                                   
1946-01-02  14:45:00     6.8  36.3  140.1  31  3.2    1
1946-01-02  22:18:00     7.6  40.5  141.4   0  4.6    3
1946-01-02  23:29:00     6.7  36.1  139.4  39  4.3    2
1946-01-03  04:28:00     5.6  34.4  136.5   1  4.2    2
1946-01-05  04:36:00     6.5  35.5  139.5  50  3.0    1

print (df.index)
DatetimeIndex(['1946-01-02', '1946-01-02', '1946-01-02', '1946-01-03',
               '1946-01-05'],
              dtype='datetime64[ns]', name='Date', freq=None)

使用以下内容创建新数据帧：

然后使用：

如果存在带有时间的

DatetimeIndex

，请通过以下方式创建列：

然后在

DateTime

列中用replace misng值删除times by和last：

d = df['DateTime'].dt.normalize()
dfx = pd.DataFrame({'Dates':pd.date_range(start=d.min(), 
                                             end=d.max(), freq='D')})

print (dfx)
       Dates
0 1946-01-02
1 1946-01-03
2 1946-01-04
3 1946-01-05

df = dfx.merge(df.assign(Dates=d), on='Dates', how='left')
df['DateTime'] = df['DateTime'].fillna(df['Dates'])
print (df)
       Dates            DateTime  Places     w      x     y    z  col
0 1946-01-02 1946-01-02 14:45:00     6.8  36.3  140.1  31.0  3.2  1.0
1 1946-01-02 1946-01-02 22:18:00     7.6  40.5  141.4   0.0  4.6  3.0
2 1946-01-02 1946-01-02 23:29:00     6.7  36.1  139.4  39.0  4.3  2.0
3 1946-01-03 1946-01-03 04:28:00     5.6  34.4  136.5   1.0  4.2  2.0
4 1946-01-04 1946-01-04 00:00:00     NaN   NaN    NaN   NaN  NaN  NaN
5 1946-01-05 1946-01-05 04:36:00     6.5  35.5  139.5  50.0  3.0  1.0

数据中只有索引中的第一列

Date

，

DatetimeIndex

？

df = dfx.join(df)
print (df)
                Time  Places     w      x     y    z  col
1946-01-02  14:45:00     6.8  36.3  140.1  31.0  3.2  1.0
1946-01-02  22:18:00     7.6  40.5  141.4   0.0  4.6  3.0
1946-01-02  23:29:00     6.7  36.1  139.4  39.0  4.3  2.0
1946-01-03  04:28:00     5.6  34.4  136.5   1.0  4.2  2.0
1946-01-04       NaN     NaN   NaN    NaN   NaN  NaN  NaN
1946-01-05  04:36:00     6.5  35.5  139.5  50.0  3.0  1.0

print (df)
                    Places     w      x   y    z  col
DateTime                                              
1946-01-02 14:45:00     6.8  36.3  140.1  31  3.2    1
1946-01-02 22:18:00     7.6  40.5  141.4   0  4.6    3
1946-01-02 23:29:00     6.7  36.1  139.4  39  4.3    2
1946-01-03 04:28:00     5.6  34.4  136.5   1  4.2    2
1946-01-05 04:36:00     6.5  35.5  139.5  50  3.0    1

print (df.index)
DatetimeIndex(['1946-01-02 14:45:00', '1946-01-02 22:18:00',
               '1946-01-02 23:29:00', '1946-01-03 04:28:00',
               '1946-01-05 04:36:00'],
              dtype='datetime64[ns]', name='DateTime', freq=None)

df = df.reset_index()
print (df)
             DateTime  Places     w      x   y    z  col
0 1946-01-02 14:45:00     6.8  36.3  140.1  31  3.2    1
1 1946-01-02 22:18:00     7.6  40.5  141.4   0  4.6    3
2 1946-01-02 23:29:00     6.7  36.1  139.4  39  4.3    2
3 1946-01-03 04:28:00     5.6  34.4  136.5   1  4.2    2
4 1946-01-05 04:36:00     6.5  35.5  139.5  50  3.0    1

d = df['DateTime'].dt.normalize()
dfx = pd.DataFrame({'Dates':pd.date_range(start=d.min(), 
                                             end=d.max(), freq='D')})

print (dfx)
       Dates
0 1946-01-02
1 1946-01-03
2 1946-01-04
3 1946-01-05

df = dfx.merge(df.assign(Dates=d), on='Dates', how='left')
df['DateTime'] = df['DateTime'].fillna(df['Dates'])
print (df)
       Dates            DateTime  Places     w      x     y    z  col
0 1946-01-02 1946-01-02 14:45:00     6.8  36.3  140.1  31.0  3.2  1.0
1 1946-01-02 1946-01-02 22:18:00     7.6  40.5  141.4   0.0  4.6  3.0
2 1946-01-02 1946-01-02 23:29:00     6.7  36.1  139.4  39.0  4.3  2.0
3 1946-01-03 1946-01-03 04:28:00     5.6  34.4  136.5   1.0  4.2  2.0
4 1946-01-04 1946-01-04 00:00:00     NaN   NaN    NaN   NaN  NaN  NaN
5 1946-01-05 1946-01-05 04:36:00     6.5  35.5  139.5  50.0  3.0  1.0