Python 时间序列：从另一个数据帧填充NAN_Python_Pandas_Dataframe_Time Series_Temperature

Python 时间序列：从另一个数据帧填充NAN

python pandas dataframe

Python 时间序列：从另一个数据帧填充NAN,python,pandas,dataframe,time-series,temperature,Python,Pandas,Dataframe,Time Series,Temperature,我正在处理温度数据，我创建了一个包含数千个城市多年平均值的文件，格式如下（df1）我有所有365天的上述数据，没有空值。请注意，date列只有day和month，因为年份是不相关的基于以上数据，我正在尝试清理年度文件，我的第二个数据帧具有以下格式的数据（df2）每个城市都有一个唯一的ID。日期列的格式为%y-%m-%d 我试图通过匹配day和month将第二个数据帧中的空值替换为第一个数据帧中的值。这就是我试过的 df1["Date"] = pd.to_datetime

我正在处理温度数据，我创建了一个包含数千个城市多年平均值的文件，格式如下（

df1

）

我有所有365天的上述数据，没有空值。请注意，

date

列只有

day

和

month

，因为年份是不相关的

基于以上数据，我正在尝试清理年度文件，我的第二个数据帧具有以下格式的数据（

df2

）

每个城市都有一个唯一的

ID

。日期列的格式为

%y-%m-%d

我试图通过匹配

day

和

month

将第二个数据帧中的空值替换为第一个数据帧中的值。这就是我试过的

df1["Date"] = pd.to_datetime(df1["Date"], errors = 'coerce')   ##date format change##
df1["Date"] = df1['Date'].dt.strftime('%d-%m')
df2 = df2.drop(columns='ID')

df2 = df2.fillna(df1)         ##To replace nulls##

df1["Date"] = pd.to_datetime(df1["Date"], errors = 'coerce')
df1["Date"] = df1['Date'].dt.strftime('%Y-%m-%d')      ## Change data back to original format##

即使这样，我的年度文件中也会出现空值，即

df2

{注意：df1没有空值}

如有必要，请建议一种更好的方法，仅替换空值或对代码的任何更正。

我们可以在

df2

上添加一列

Date2

，格式与

df1

上的

Date

列相同。然后，在使用此日期格式和城市作为索引设置两个数据帧时，我们对df2执行更新，如下所示：

df2["Date2"] = pd.to_datetime(df2["Date"], errors = 'coerce').dt.strftime('%d-%b')          #  dd-MMM (e.g. 01-JAN)

df2a = df2.set_index(['Date2', 'City'])        # Create df2a from df2 with set index on Date2 and City

df2a.update(df1.set_index(['Date', 'City']), overwrite=False)   # update only NaN values of df2a by corresponding values of df1

df2 = df2a.reset_index(level=1).reset_index(drop=True)    # result put back to df2 throwing away the temp `Date2` row index

df2.insert(2, 'City', df2.pop('City'))    # relocate column City back to its original position

是使用来自另一个数据帧的非NA值就地修改。数据帧的长度不会因更新而增加，只更新匹配索引/列标签处的值。因此，我们使用相同的行索引创建这两个数据帧，以便对具有相同列索引/标签的相应列执行更新

请注意，我们在中使用参数

overwrite=False

，以确保只更新原始数据帧

df2

中的NaN值

演示 数据设置：

df2["Date2"] = pd.to_datetime(df2["Date"], errors = 'coerce').dt.strftime('%d-%b')          #  dd-MMM (e.g. 01-JAN)

df2a = df2.set_index(['Date2', 'City'])        # Create df2a from df2 with set index on Date2 and City

df2a.update(df1.set_index(['Date', 'City']), overwrite=False)   # update only NaN values of df2a by corresponding values of df1

df2 = df2a.reset_index(level=1).reset_index(drop=True)    # result put back to df2 throwing away the temp `Date2` row index

df2.insert(2, 'City', df2.pop('City'))    # relocate column City back to its original position

print(df2)


      ID       Date    City  PRCP  TAVG  TMAX  TMIN
0  abcd1 2020-01-01  Zurich   0.0 -1.90  -0.9  0.36     <== TMIN updated with df1 value
1  abcd1 2020-01-02  Zurich   9.1  1.96  12.7  4.90     <== TAVG updated with df1 value
2  abcd1 2020-01-03  Zurich   0.8  8.55  13.2  3.90
3  abcd1 2020-01-04  Zurich   0.0  4.10  10.8 -2.60

在

df1

中添加数据，以显示从

df1

替换

df2

的值：

print(df1)

     Date    City  PRCP   TMAX  TMIN  TAVG
0  01-Jan  Zurich  0.94   3.54  0.36  1.95
1  02-Jan  Zurich  0.95   3.55  0.37  1.96       <=== Added this row
2  01-Feb  Zurich  4.12   9.14  3.04  6.09
3  01-Mar  Zurich  4.10   5.90  0.30  3.10
4  01-Apr  Zurich  0.32  13.78  4.22  9.00
5  01-May  Zurich  9.42  11.32  5.34  8.33

print(df2)       #  before processing

      ID        Date    City  PRCP  TAVG  TMAX  TMIN
0  abcd1  2020-01-01  Zurich   0.0 -1.90  -0.9   NaN         <=== with NaN value
1  abcd1  2020-01-02  Zurich   9.1   NaN  12.7   4.9         <=== with NaN value
2  abcd1  2020-01-03  Zurich   0.8  8.55  13.2   3.9
3  abcd1  2020-01-04  Zurich   0.0  4.10  10.8  -2.6

结果：

df2["Date2"] = pd.to_datetime(df2["Date"], errors = 'coerce').dt.strftime('%d-%b')          #  dd-MMM (e.g. 01-JAN)

df2a = df2.set_index(['Date2', 'City'])        # Create df2a from df2 with set index on Date2 and City

df2a.update(df1.set_index(['Date', 'City']), overwrite=False)   # update only NaN values of df2a by corresponding values of df1

df2 = df2a.reset_index(level=1).reset_index(drop=True)    # result put back to df2 throwing away the temp `Date2` row index

df2.insert(2, 'City', df2.pop('City'))    # relocate column City back to its original position

print(df2)


      ID       Date    City  PRCP  TAVG  TMAX  TMIN
0  abcd1 2020-01-01  Zurich   0.0 -1.90  -0.9  0.36     <== TMIN updated with df1 value
1  abcd1 2020-01-02  Zurich   9.1  1.96  12.7  4.90     <== TAVG updated with df1 value
2  abcd1 2020-01-03  Zurich   0.8  8.55  13.2  3.90
3  abcd1 2020-01-04  Zurich   0.0  4.10  10.8 -2.60

打印（df2）
ID日期城市PRCP TAVG TMAX TMIN
0 abcd1 2020-01-01苏黎世0.0-1.90-0.9 0.36如果我错了，请纠正我，但df1没有空值，所以你的意思是df2=df1.fillna（df2）
？与让熊猫吐出你想要的食物无关，但与“有没有更好的方法替换空值”的问题有关，你可能想看看Sklearn。他们有一些内置的工具来插补缺失的值：@HenryEcker是的，这就是我的意思，很抱歉输入错误。如果需要进一步澄清答案，请告诉我！