Python 使用groupby ValueError填充缺少的值

Python 使用groupby ValueError填充缺少的值,python,pandas,missing-data,Python,Pandas,Missing Data,我正在尝试填充数据集“日期”列中缺少的值 CODE City Date TAVG TMAX TMIN CA003033890 Lethbridge 08-01-2020 -3.55 4.7 -11.8 CA003033890 Lethbridge 09-01-2020 -17.05 -11.5 -22.6 CA003033890 Lethbridge 10-01-2020 -13.7 -1.9 -25.

我正在尝试填充数据集“日期”列中缺少的值

CODE        City        Date         TAVG    TMAX   TMIN
CA003033890 Lethbridge  08-01-2020  -3.55    4.7    -11.8
CA003033890 Lethbridge  09-01-2020  -17.05  -11.5   -22.6
CA003033890 Lethbridge  10-01-2020  -13.7   -1.9    -25.5
CA003033890 Lethbridge  11-01-2020  -7.8     0.7    -16.3
CA003033890 Lethbridge  12-01-2020  -20.3   -16.3   -24.3
CA003033890 Lethbridge  13-01-2020  -24.6   -22.4   -26.8
CA003033890 Lethbridge  14-01-2020  -27     -23.7   -30.3
CA003033890 Lethbridge  15-01-2020  -29.55  -26.8   -32.3
CA003033890 Lethbridge  16-01-2020  -26.05  -23.2   -28.9
CA003033890 Lethbridge  17-01-2020  -23.45  -19.2   -27.7
对于上述代码
CA003033890
,请注意,从2020年1月1日到2020年7月1日的日期缺失,类似地,对于其他
code
s,
Date
列值随机缺失

这是我试过的代码

data.Date=pd.to_datetime(data.Date)
merge_df  = data.set_index('Date').groupby('CODE').apply(lambda x : x.resample('D').max().ffill()).reset_index(level=1)
当我运行它时,它似乎一直在运行,后来返回了下面的错误

Traceback (most recent call last):                                                                                                                                                                                                                      
  File "test.py", line 45, in <module>                                                                                                                                                                                                                  
    data['Date'] = data.groupby('CODE')['Date'].apply(lambda d: d.reindex(pd.date_range(min(df1.Date),max(df1.Date),freq='D'))).drop('CODE', axis=1).reset_index('CODE').fillna(value=None)                                                             
  File "C:\Python\Python38\lib\site-packages\pandas\core\series.py", line 4132, in drop                                                                                                                                                                 
    return super().drop(                                                                                                                                                                                                                                
  File "C:\Python\Python38\lib\site-packages\pandas\core\generic.py", line 3923, in drop                                                                                                                                                                
    axis_name = self._get_axis_name(axis)                                                                                                                                                                                                               
  File "C:\Python\Python38\lib\site-packages\pandas\core\generic.py", line 420, in _get_axis_name                                                                                                                                                       
    raise ValueError(f"No axis named {axis} for object type {cls}")                                                                                                                                                                                     
ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> 

另外,有没有更快的方法来实现这一点?

您可以在每组中创建一个多索引和
重新索引
,然后
重置索引

df_list=[]
对于df.groupby('code')中的(代码,组):
idx=pd.MultiIndex.from_乘积([group['CODE'].unique(),
pd.date_范围(组['date'].max().replace(天=1),结束=group['date'].max(),频率=D'),
名称=[“代码”,“日期])
group=group.set_索引(['CODE','Date'])。reindex(idx)。reset_索引()
组['City']=组['City'].fillna(方法='bfill')
df_list.append(组)
new_df=pd.concat(df_list,ignore_index=True)
A MWE:

导入系统 作为pd进口熊猫 从io导入StringIO TESTDATA=StringIO(““”代码城市日期TAVG TMAX TMIN CA003033890 Lethbridge 08-01-2020-3.55 4.7-11.8 CA003033890 Lethbridge 09-01-2020-17.05-11.5-22.6 CA003033890莱思布里奇10-01-2020-13.7-1.9-25.5 CA003033890莱思布里奇11-01-2020-7.80.7-16.3 CA003033890莱思布里奇12-01-2020-20.3-16.3-24.3 CA003033890莱思布里奇13-01-2020-24.6-22.4-26.8 CA003033890莱思布里奇14-01-2020-27-23.7-30.3 CA003033890莱思布里奇15-01-2020-29.55-26.8-32.3 CA003033890莱思布里奇16-01-2020-26.05-23.2-28.9 CA003033890莱思布里奇17-01-2020-23.45-19.2-27.7 CA003033891 abc 11-01-2020-24.6-22.4-26.8 CA003033891 abc 14-01-2020-27-23.7-30.3 CA003033891 abc 15-01-2020-23.45-19.2-27.7 """) df=pd.read\u csv(TESTDATA,delim\u whitespace=True) df['Date']=pd.to_datetime(df['Date'],格式=“%d-%m-%Y”) df_列表=[] 对于df.groupby('code')中的(代码,组): idx=pd.MultiIndex.from_乘积([group['CODE'].unique(), pd.date_范围(组['date'].max().replace(天=1),结束=group['date'].max(),频率=D'), 名称=[“代码”,“日期]) group=group.set_索引(['CODE','Date'])。reindex(idx)。reset_索引() 组['City']=组['City'].fillna(方法='bfill') df_list.append(组) new_df=pd.concat(df_list,ignore_index=True)
@TNT似乎和我摆姿势的MWE没什么关系。你能详细说明一下吗?不确定出了什么问题,但当我运行这个程序时,它没有在“代码”和“城市”之间创建一个唯一的(1对1)映射,而是创建了重复的映射,例如,100个不同的城市映射到一个“代码”@TNT你给出的示例就是很多相同的城市映射到一个代码。如果你能提供数据集,我很乐意帮忙。那么代码和城市是一一对应的吗?那么问题可能是
pd.MultiIndex.from_product([df['code'].unique(),df['City'].unique())
。如果您尝试类似的方法,您可能会发现错误在哪里"代码城市日期TAVG TMAX TMIN CA003033890莱思布里奇10-01-2020-13.7-1.9-25.5 CA003033890莱思布里奇11-01-2020-7.8 0.7-16.3 CA0033890莱思布里奇12-01-2020-20.3-16.3-24.3 CA003033891 abc 13-01-2020-24.6-22.4-26.8 CA0033891 abc 14-01-2020-27-23.7-30.3 CA003033891 abc 17-01-2020-23.45-19.2-27.7“”
CODE        City        Date        TAVG    TMAX    TMIN
CA003033890 Lethbridge  01-01-2020          
CA003033890 Lethbridge  02-01-2020          
CA003033890 Lethbridge  03-01-2020          
CA003033890 Lethbridge  04-01-2020          
CA003033890 Lethbridge  05-01-2020          
CA003033890 Lethbridge  06-01-2020          
CA003033890 Lethbridge  07-01-2020          
CA003033890 Lethbridge  08-01-2020  -3.55    4.7    -11.8
CA003033890 Lethbridge  09-01-2020  -17.05  -11.5   -22.6
CA003033890 Lethbridge  10-01-2020  -13.7   -1.9    -25.5
# print(new_df)

           CODE       Date        City   TAVG  TMAX  TMIN
0   CA003033890 2020-01-01  Lethbridge    NaN   NaN   NaN
1   CA003033890 2020-01-02  Lethbridge    NaN   NaN   NaN
2   CA003033890 2020-01-03  Lethbridge    NaN   NaN   NaN
3   CA003033890 2020-01-04  Lethbridge    NaN   NaN   NaN
4   CA003033890 2020-01-05  Lethbridge    NaN   NaN   NaN
5   CA003033890 2020-01-06  Lethbridge    NaN   NaN   NaN
6   CA003033890 2020-01-07  Lethbridge    NaN   NaN   NaN
7   CA003033890 2020-01-08  Lethbridge  -3.55   4.7 -11.8
8   CA003033890 2020-01-09  Lethbridge -17.05 -11.5 -22.6
9   CA003033890 2020-01-10  Lethbridge -13.70  -1.9 -25.5
10  CA003033890 2020-01-11  Lethbridge  -7.80   0.7 -16.3
11  CA003033890 2020-01-12  Lethbridge -20.30 -16.3 -24.3
12  CA003033890 2020-01-13  Lethbridge -24.60 -22.4 -26.8
13  CA003033890 2020-01-14  Lethbridge -27.00 -23.7 -30.3
14  CA003033890 2020-01-15  Lethbridge -29.55 -26.8 -32.3
15  CA003033890 2020-01-16  Lethbridge -26.05 -23.2 -28.9
16  CA003033890 2020-01-17  Lethbridge -23.45 -19.2 -27.7
17  CA003033891 2020-01-01         abc    NaN   NaN   NaN
18  CA003033891 2020-01-02         abc    NaN   NaN   NaN
19  CA003033891 2020-01-03         abc    NaN   NaN   NaN
20  CA003033891 2020-01-04         abc    NaN   NaN   NaN
21  CA003033891 2020-01-05         abc    NaN   NaN   NaN
22  CA003033891 2020-01-06         abc    NaN   NaN   NaN
23  CA003033891 2020-01-07         abc    NaN   NaN   NaN
24  CA003033891 2020-01-08         abc    NaN   NaN   NaN
25  CA003033891 2020-01-09         abc    NaN   NaN   NaN
26  CA003033891 2020-01-10         abc    NaN   NaN   NaN
27  CA003033891 2020-01-11         abc -24.60 -22.4 -26.8
28  CA003033891 2020-01-12         abc    NaN   NaN   NaN
29  CA003033891 2020-01-13         abc    NaN   NaN   NaN
30  CA003033891 2020-01-14         abc -27.00 -23.7 -30.3
31  CA003033891 2020-01-15         abc -23.45 -19.2 -27.7