Python 使用groupby ValueError填充缺少的值
我正在尝试填充数据集“日期”列中缺少的值Python 使用groupby ValueError填充缺少的值,python,pandas,missing-data,Python,Pandas,Missing Data,我正在尝试填充数据集“日期”列中缺少的值 CODE City Date TAVG TMAX TMIN CA003033890 Lethbridge 08-01-2020 -3.55 4.7 -11.8 CA003033890 Lethbridge 09-01-2020 -17.05 -11.5 -22.6 CA003033890 Lethbridge 10-01-2020 -13.7 -1.9 -25.
CODE City Date TAVG TMAX TMIN
CA003033890 Lethbridge 08-01-2020 -3.55 4.7 -11.8
CA003033890 Lethbridge 09-01-2020 -17.05 -11.5 -22.6
CA003033890 Lethbridge 10-01-2020 -13.7 -1.9 -25.5
CA003033890 Lethbridge 11-01-2020 -7.8 0.7 -16.3
CA003033890 Lethbridge 12-01-2020 -20.3 -16.3 -24.3
CA003033890 Lethbridge 13-01-2020 -24.6 -22.4 -26.8
CA003033890 Lethbridge 14-01-2020 -27 -23.7 -30.3
CA003033890 Lethbridge 15-01-2020 -29.55 -26.8 -32.3
CA003033890 Lethbridge 16-01-2020 -26.05 -23.2 -28.9
CA003033890 Lethbridge 17-01-2020 -23.45 -19.2 -27.7
对于上述代码CA003033890
,请注意,从2020年1月1日到2020年7月1日的日期缺失,类似地,对于其他code
s,Date
列值随机缺失
这是我试过的代码
data.Date=pd.to_datetime(data.Date)
merge_df = data.set_index('Date').groupby('CODE').apply(lambda x : x.resample('D').max().ffill()).reset_index(level=1)
当我运行它时,它似乎一直在运行,后来返回了下面的错误
Traceback (most recent call last):
File "test.py", line 45, in <module>
data['Date'] = data.groupby('CODE')['Date'].apply(lambda d: d.reindex(pd.date_range(min(df1.Date),max(df1.Date),freq='D'))).drop('CODE', axis=1).reset_index('CODE').fillna(value=None)
File "C:\Python\Python38\lib\site-packages\pandas\core\series.py", line 4132, in drop
return super().drop(
File "C:\Python\Python38\lib\site-packages\pandas\core\generic.py", line 3923, in drop
axis_name = self._get_axis_name(axis)
File "C:\Python\Python38\lib\site-packages\pandas\core\generic.py", line 420, in _get_axis_name
raise ValueError(f"No axis named {axis} for object type {cls}")
ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'>
另外,有没有更快的方法来实现这一点?您可以在每组中创建一个多索引和
重新索引
,然后重置索引
:
df_list=[]
对于df.groupby('code')中的(代码,组):
idx=pd.MultiIndex.from_乘积([group['CODE'].unique(),
pd.date_范围(组['date'].max().replace(天=1),结束=group['date'].max(),频率=D'),
名称=[“代码”,“日期])
group=group.set_索引(['CODE','Date'])。reindex(idx)。reset_索引()
组['City']=组['City'].fillna(方法='bfill')
df_list.append(组)
new_df=pd.concat(df_list,ignore_index=True)
A MWE:
导入系统
作为pd进口熊猫
从io导入StringIO
TESTDATA=StringIO(““”代码城市日期TAVG TMAX TMIN
CA003033890 Lethbridge 08-01-2020-3.55 4.7-11.8
CA003033890 Lethbridge 09-01-2020-17.05-11.5-22.6
CA003033890莱思布里奇10-01-2020-13.7-1.9-25.5
CA003033890莱思布里奇11-01-2020-7.80.7-16.3
CA003033890莱思布里奇12-01-2020-20.3-16.3-24.3
CA003033890莱思布里奇13-01-2020-24.6-22.4-26.8
CA003033890莱思布里奇14-01-2020-27-23.7-30.3
CA003033890莱思布里奇15-01-2020-29.55-26.8-32.3
CA003033890莱思布里奇16-01-2020-26.05-23.2-28.9
CA003033890莱思布里奇17-01-2020-23.45-19.2-27.7
CA003033891 abc 11-01-2020-24.6-22.4-26.8
CA003033891 abc 14-01-2020-27-23.7-30.3
CA003033891 abc 15-01-2020-23.45-19.2-27.7
""")
df=pd.read\u csv(TESTDATA,delim\u whitespace=True)
df['Date']=pd.to_datetime(df['Date'],格式=“%d-%m-%Y”)
df_列表=[]
对于df.groupby('code')中的(代码,组):
idx=pd.MultiIndex.from_乘积([group['CODE'].unique(),
pd.date_范围(组['date'].max().replace(天=1),结束=group['date'].max(),频率=D'),
名称=[“代码”,“日期])
group=group.set_索引(['CODE','Date'])。reindex(idx)。reset_索引()
组['City']=组['City'].fillna(方法='bfill')
df_list.append(组)
new_df=pd.concat(df_list,ignore_index=True)
@TNT似乎和我摆姿势的MWE没什么关系。你能详细说明一下吗?不确定出了什么问题,但当我运行这个程序时,它没有在“代码”和“城市”之间创建一个唯一的(1对1)映射,而是创建了重复的映射,例如,100个不同的城市映射到一个“代码”@TNT你给出的示例就是很多相同的城市映射到一个代码。如果你能提供数据集,我很乐意帮忙。那么代码和城市是一一对应的吗?那么问题可能是
pd.MultiIndex.from_product([df['code'].unique(),df['City'].unique())
。如果您尝试类似的方法,您可能会发现错误在哪里"代码城市日期TAVG TMAX TMIN CA003033890莱思布里奇10-01-2020-13.7-1.9-25.5 CA003033890莱思布里奇11-01-2020-7.8 0.7-16.3 CA0033890莱思布里奇12-01-2020-20.3-16.3-24.3 CA003033891 abc 13-01-2020-24.6-22.4-26.8 CA0033891 abc 14-01-2020-27-23.7-30.3 CA003033891 abc 17-01-2020-23.45-19.2-27.7“”
CODE City Date TAVG TMAX TMIN
CA003033890 Lethbridge 01-01-2020
CA003033890 Lethbridge 02-01-2020
CA003033890 Lethbridge 03-01-2020
CA003033890 Lethbridge 04-01-2020
CA003033890 Lethbridge 05-01-2020
CA003033890 Lethbridge 06-01-2020
CA003033890 Lethbridge 07-01-2020
CA003033890 Lethbridge 08-01-2020 -3.55 4.7 -11.8
CA003033890 Lethbridge 09-01-2020 -17.05 -11.5 -22.6
CA003033890 Lethbridge 10-01-2020 -13.7 -1.9 -25.5
# print(new_df)
CODE Date City TAVG TMAX TMIN
0 CA003033890 2020-01-01 Lethbridge NaN NaN NaN
1 CA003033890 2020-01-02 Lethbridge NaN NaN NaN
2 CA003033890 2020-01-03 Lethbridge NaN NaN NaN
3 CA003033890 2020-01-04 Lethbridge NaN NaN NaN
4 CA003033890 2020-01-05 Lethbridge NaN NaN NaN
5 CA003033890 2020-01-06 Lethbridge NaN NaN NaN
6 CA003033890 2020-01-07 Lethbridge NaN NaN NaN
7 CA003033890 2020-01-08 Lethbridge -3.55 4.7 -11.8
8 CA003033890 2020-01-09 Lethbridge -17.05 -11.5 -22.6
9 CA003033890 2020-01-10 Lethbridge -13.70 -1.9 -25.5
10 CA003033890 2020-01-11 Lethbridge -7.80 0.7 -16.3
11 CA003033890 2020-01-12 Lethbridge -20.30 -16.3 -24.3
12 CA003033890 2020-01-13 Lethbridge -24.60 -22.4 -26.8
13 CA003033890 2020-01-14 Lethbridge -27.00 -23.7 -30.3
14 CA003033890 2020-01-15 Lethbridge -29.55 -26.8 -32.3
15 CA003033890 2020-01-16 Lethbridge -26.05 -23.2 -28.9
16 CA003033890 2020-01-17 Lethbridge -23.45 -19.2 -27.7
17 CA003033891 2020-01-01 abc NaN NaN NaN
18 CA003033891 2020-01-02 abc NaN NaN NaN
19 CA003033891 2020-01-03 abc NaN NaN NaN
20 CA003033891 2020-01-04 abc NaN NaN NaN
21 CA003033891 2020-01-05 abc NaN NaN NaN
22 CA003033891 2020-01-06 abc NaN NaN NaN
23 CA003033891 2020-01-07 abc NaN NaN NaN
24 CA003033891 2020-01-08 abc NaN NaN NaN
25 CA003033891 2020-01-09 abc NaN NaN NaN
26 CA003033891 2020-01-10 abc NaN NaN NaN
27 CA003033891 2020-01-11 abc -24.60 -22.4 -26.8
28 CA003033891 2020-01-12 abc NaN NaN NaN
29 CA003033891 2020-01-13 abc NaN NaN NaN
30 CA003033891 2020-01-14 abc -27.00 -23.7 -30.3
31 CA003033891 2020-01-15 abc -23.45 -19.2 -27.7