Python 填充缺少的日期后,填充groupby对象中的值
很多类似的问题都被问到了,这对我解决这个问题有很大帮助,我遵循了以下帮助: 及 然而,它仍然没有做到这一点 我制作了一个玩具数据集来演示我面临的问题:Python 填充缺少的日期后,填充groupby对象中的值,python,pandas,Python,Pandas,很多类似的问题都被问到了,这对我解决这个问题有很大帮助,我遵循了以下帮助: 及 然而,它仍然没有做到这一点 我制作了一个玩具数据集来演示我面临的问题: data = pd.DataFrame({'Date': ['2012-01-01', '2012-01-01','2012-01-01','2012-01-02','2012-01-02','2012-01-02','2012-01-03'], 'Id': ['21','21','22','21','22','23','21'], 'Quan
data = pd.DataFrame({'Date': ['2012-01-01', '2012-01-01','2012-01-01','2012-01-02','2012-01-02','2012-01-02','2012-01-03'], 'Id': ['21','21','22','21','22','23','21'], 'Quantity': ['5','1','4','4','2','1','4'], 'NetAmount': ['66','45','76','35','76','73','45']})
data['Quantity'] = data['Quantity'].astype('int')
data['NetAmount'] = data['NetAmount'].astype('float')
我对数据集进行了分组,如下代码所示:
data['Date'] =pd.to_datetime(data.Date) - pd.to_timedelta(7,unit = 'd')
data =data.groupby(['Id',pd.Grouper(key='Date', freq='W-MON')])['Quantity', 'NetAmount'].sum().reset_index().sort_values('Date')
data.reset_index()
data1 = data.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
然后我填写缺失的日期:
data2 = data1.set_index(['Date', 'Id','NetAmount']).Quantity.unstack(-3).\
reindex(columns=pd.date_range(data1['Date'].min(), data1['Date'].max(),freq='W-MON'),fill_value=0).\
stack(dropna=False).unstack().stack(dropna=False).\
unstack('NetAmount').stack(dropna=False).fillna(0).reset_index()
给出生成的数据帧:
Id level_1 NetAmount 0
0 21 2011-12-26 45.0 0.0
1 21 2011-12-26 73.0 0.0
2 21 2011-12-26 146.0 10.0
3 21 2011-12-26 152.0 0.0
4 21 2012-01-02 45.0 4.0
5 21 2012-01-02 73.0 0.0
6 21 2012-01-02 146.0 0.0
7 21 2012-01-02 152.0 0.0
8 22 2011-12-26 45.0 0.0
9 22 2011-12-26 73.0 0.0
10 22 2011-12-26 146.0 0.0
11 22 2011-12-26 152.0 6.0
12 22 2012-01-02 45.0 0.0
13 22 2012-01-02 73.0 0.0
14 22 2012-01-02 146.0 0.0
15 22 2012-01-02 152.0 0.0
16 23 2011-12-26 45.0 0.0
17 23 2011-12-26 73.0 1.0
18 23 2011-12-26 146.0 0.0
19 23 2011-12-26 152.0 0.0
20 23 2012-01-02 45.0 0.0
21 23 2012-01-02 73.0 0.0
22 23 2012-01-02 146.0 0.0
23 23 2012-01-02 152.0 0.0
但实际上我希望得到:
0 21 2011-12-26 66.0 5.0
1 21 2011-12-26 45.0 1.0
2 21 2011-12-26 35.0 4.0
3 21 2012-02-02 45.0 4.0
4 22 2011-12-26 76.0 4.0
5 22 2012-02-02 76.0 2.0
6 23 2011-12-26 0.0 0.0
7 23 2012-02-02 73.0 1.0
填充成功,但是,我不了解结果数据框中到底发生了什么,例如netAmount列中的实例,结果已关闭我不熟悉取消堆栈/堆栈功能,我是否在过程中遗漏了什么?谢谢你的帮助
更新:添加“0”值后,我已尝试按id和数据重新分组:
但是我得到了这个错误
Traceback (most recent call last):
File "", line 48, in <module>
data3 = data2.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
File "", line 7632, in groupby
observed=observed, **kwargs)
File "", line 2110, in groupby
return klass(obj, by, **kwds)
File "", line 360, in __init__
mutated=self.mutated)
File "", line 578, in _get_grouper
raise KeyError(gpr)
KeyError: 'Date'
回溯(最近一次呼叫最后一次):
文件“”,第48行,在
data3=data2.groupby(['Id','Date']).agg({'Quantity':sum,'NetAmount':sum}).reset_index()
文件“”,第7632行,在groupby中
观察到的=观察到的,**千克)
文件“”,第2110行,在groupby中
返回klass(obj,由,**科威特先令)
文件“”,第360行,在初始化中__
变异的
文件“”,第578行,在grouper中
raise KeyError(探地雷达)
KeyError:“日期”
您需要将列Quantity
和NetAmount
转换为数字
data['Quantity'] = data['Quantity'].astype('int')
data['NetAmount'] = data['NetAmount'].astype('float')
当列是字符串时,sum函数按组连接所有字符串
现在重新运行您的代码,它应该可以正常工作
# Id level_1 NetAmount 0
#0 21 2011-12-26 45.0 0.0
#1 21 2011-12-26 73.0 0.0
#2 21 2011-12-26 146.0 10.0
#3 21 2011-12-26 152.0 0.0
#4 21 2012-01-02 45.0 4.0
#5 21 2012-01-02 73.0 0.0
#6 21 2012-01-02 146.0 0.0
#7 21 2012-01-02 152.0 0.0
#8 22 2011-12-26 45.0 0.0
#9 22 2011-12-26 73.0 0.0
#10 22 2011-12-26 146.0 0.0
#11 22 2011-12-26 152.0 6.0
#12 22 2012-01-02 45.0 0.0
#13 22 2012-01-02 73.0 0.0
#14 22 2012-01-02 146.0 0.0
#15 22 2012-01-02 152.0 0.0
#16 23 2011-12-26 45.0 0.0
#17 23 2011-12-26 73.0 1.0
#18 23 2011-12-26 146.0 0.0
#19 23 2011-12-26 152.0 0.0
#20 23 2012-01-02 45.0 0.0
#21 23 2012-01-02 73.0 0.0
#22 23 2012-01-02 146.0 0.0
#23 23 2012-01-02 152.0 0.0
它至少摆脱了Nan值!但它仍然不起作用。不可能得到那些大的答案,我应该得到大部分0,如果数量是0,netAmount也应该是0,因为它们都是填充值(0)的结果。我在运行代码时得到的大部分是零,我不再有那些大的值,您是否在
pd.DataFrame(…)
和data['Date]之前添加了代码=…
?我已经在代码末尾添加了数据帧data2
,您能告诉我哪些行是错误的吗?我有,您在NetAmount列中也得到了0吗?我想在我的问题中包含一个预期结果数据框
# Id level_1 NetAmount 0
#0 21 2011-12-26 45.0 0.0
#1 21 2011-12-26 73.0 0.0
#2 21 2011-12-26 146.0 10.0
#3 21 2011-12-26 152.0 0.0
#4 21 2012-01-02 45.0 4.0
#5 21 2012-01-02 73.0 0.0
#6 21 2012-01-02 146.0 0.0
#7 21 2012-01-02 152.0 0.0
#8 22 2011-12-26 45.0 0.0
#9 22 2011-12-26 73.0 0.0
#10 22 2011-12-26 146.0 0.0
#11 22 2011-12-26 152.0 6.0
#12 22 2012-01-02 45.0 0.0
#13 22 2012-01-02 73.0 0.0
#14 22 2012-01-02 146.0 0.0
#15 22 2012-01-02 152.0 0.0
#16 23 2011-12-26 45.0 0.0
#17 23 2011-12-26 73.0 1.0
#18 23 2011-12-26 146.0 0.0
#19 23 2011-12-26 152.0 0.0
#20 23 2012-01-02 45.0 0.0
#21 23 2012-01-02 73.0 0.0
#22 23 2012-01-02 146.0 0.0
#23 23 2012-01-02 152.0 0.0