Python使用缺少的值填充数据帧
我以这个数据帧为例Python使用缺少的值填充数据帧,python,pandas,Python,Pandas,我以这个数据帧为例 import pandas as pd #create dataframe df = pd.DataFrame([['DE', 'Table',201705,201705, 1000], ['DE', 'Table',201705,201704, 1000],\ ['DE', 'Table',201705,201702, 1000], ['DE', 'Table',201705,201701, 1000],\
import pandas as pd
#create dataframe
df = pd.DataFrame([['DE', 'Table',201705,201705, 1000], ['DE', 'Table',201705,201704, 1000],\
['DE', 'Table',201705,201702, 1000], ['DE', 'Table',201705,201701, 1000],\
['AT', 'Table',201708,201708, 1000], ['AT', 'Table',201708,201706, 1000],\
['AT', 'Table',201708,201705, 1000], ['AT', 'Table',201708,201704, 1000]],\
columns=['ISO','Product','Billed Week', 'Created Week', 'Billings'])
print (df)
ISO Product Billed Week Created Week Billings
0 DE Table 201705 201705 1000
1 DE Table 201705 201704 1000
2 DE Table 201705 201702 1000
3 DE Table 201705 201701 1000
4 AT Table 201708 201708 1000
5 AT Table 201708 201706 1000
6 AT Table 201708 201705 1000
7 AT Table 201708 201704 1000
我需要做的是为每个groupby['ISO','Product']填入一些缺少的数据,其中序列中有一个中断,即在某个星期内没有创建任何账单,因此缺少。它需要基于计费周的最大值和创建周的最小值。也就是说,组合应该是完整的,没有顺序中断
因此,对于上述内容,我需要以编程方式将缺少的记录追加到数据库中,如下所示:
ISO Product Billed Week Created Week Billings
0 DE Table 201705 201703 0
1 AT Table 201708 201707 0
这是我的解决方案。我相信一些天才会提供更好的解决方案~让我们等待~
df1=df.groupby('ISO').agg({'Billed Week' : np.max,'Created Week' : np.min})
df1['ISO']=df1.index
Created Week Billed Week ISO
ISO
AT 201704 201708 AT
DE 201701 201705 DE
ISO=[]
BilledWeek=[]
CreateWeek=[]
for i in range(len(df1)):
BilledWeek.extend([df1.ix[i,1]]*(df1.ix[i,1]-df1.ix[i,0]+1))
CreateWeek.extend(list(range(df1.ix[i,0],df1.ix[i,1]+1)))
ISO.extend([df1.ix[i,2]]*(df1.ix[i,1]-df1.ix[i,0]+1))
DF=pd.DataFrame({'BilledWeek':BilledWeek,'CreateWeek':CreateWeek,'ISO':ISO})
Target=DF.merge(df,left_on=['BilledWeek','CreateWeek','ISO'],right_on=['Billed Week','Created Week','ISO'],how='left')
Target.Billings.fillna(0,inplace=True)
Target=Target.drop(['Billed Week', 'Created Week'],axis=1)
Target['Product']=Target.groupby('ISO')['Product'].ffill()
Out[75]:
BilledWeek CreateWeek ISO Product Billings
0 201708 201704 AT Table 1000.0
1 201708 201705 AT Table 1000.0
2 201708 201706 AT Table 1000.0
3 201708 201707 AT Table 0.0
4 201708 201708 AT Table 1000.0
5 201705 201701 DE Table 1000.0
6 201705 201702 DE Table 1000.0
7 201705 201703 DE Table 0.0
8 201705 201704 DE Table 1000.0
9 201705 201705 DE Table 1000.0
建立一个多索引,填补创建周内的所有空白,然后重新编制原始DF的索引
idx = (df.groupby(['Billed Week'])
.apply(lambda x: [(x['ISO'].min(),
x['Product'].min(),
x['Billed Week'].min(),
e) for e in range(x['Created Week'].min(), x['Created Week'].max()+1)])
.tolist()
)
multi_idx = pd.MultiIndex.from_tuples(sum(idx,[]),names=['ISO','Product','Billed Week','Created Week'])
(df.set_index(['ISO','Product','Billed Week','Created Week'])
.reindex(multi_idx)
.reset_index()
.fillna(0)
)
Out[671]:
ISO Product Billed Week Created Week Billings
0 DE Table 201705 201701 1000.0
1 DE Table 201705 201702 1000.0
2 DE Table 201705 201703 0.0
3 DE Table 201705 201704 1000.0
4 DE Table 201705 201705 1000.0
5 AT Table 201708 201704 1000.0
6 AT Table 201708 201705 1000.0
7 AT Table 201708 201706 1000.0
8 AT Table 201708 201707 0.0
9 AT Table 201708 201708 1000.0
谢谢@Wen,它看起来比我想象中的黑客好4倍。稍后我将不得不检查它,并会让您知道在我的较大数据帧和此测试数据帧上是否适合我的需要。好的,如果代码不适合您,请告诉我~我无意中编辑了您的答案,但我已将其还原。对不起,艾伦,没问题!
def seqfix(x):
s = x['Created Week']
x = x.set_index('Created Week')
x = x.reindex(range(min(s), max(s)+1))
x['Billings'] = x['Billings'].fillna(0)
x = x.ffill().reset_index()
return x
df = df.groupby(['ISO', 'Billed Week']).apply(seqfix).reset_index(drop=True)
df[['Billed Week', 'Billings']] = df[['Billed Week', 'Billings']].astype(int)
df = df[['ISO', 'Product', 'Billed Week', 'Created Week', 'Billings']]
print(df)
ISO Product Billed Week Created Week Billings
0 AT Table 201708 201704 1000
1 AT Table 201708 201705 1000
2 AT Table 201708 201706 1000
3 AT Table 201708 201707 0
4 AT Table 201708 201708 1000
5 DE Table 201705 201701 1000
6 DE Table 201705 201702 1000
7 DE Table 201705 201703 0
8 DE Table 201705 201704 1000
9 DE Table 201705 201705 1000