Python 追加不存在的行数据时,pd.Crosstab中的多索引出现问题

Python 追加不存在的行数据时,pd.Crosstab中的多索引出现问题,python,pandas,multi-index,crosstab,Python,Pandas,Multi Index,Crosstab,早上好 星期五快乐。我有一些excel输出,通过交叉表命令显示客户机、他们的扇区和他们的结果计数。这很有效: dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'], 'City': ['SY','LN','NY','TO','TK','LA']} dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients) df = { 'Clien

早上好

星期五快乐。我有一些excel输出,通过交叉表命令显示
客户机
、他们的
扇区
和他们的
结果
计数。这很有效:

dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
                'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = {  'Client': ['AAA', 'AAA', 'AAA',
                 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
                 'CCC',
                'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
        'Sector': ['GOV', 'GOV', 'CORP',
                 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
                 'GOV',
                 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
        'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
               'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
               'No RFQ',
               'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
      }
df = pd.DataFrame.from_dict(df)
# print(df)

vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']

df = (pd.crosstab([df.Client,
                  df.Sector],
                 df.Result,
                 margins=True,
                 margins_name='Total_Result_Per_Client')
        .drop('Total_Result_Per_Client')
        .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:\Temp\Out_Data_EOM_Key_Clients_Corp.xlsx')
excel摘录满足要求

另一个请求是添加所有其他可能的客户,这些客户不在当前月份数据中,但可能在未来月份。 对于没有数据的每个
客户端
,在交叉表中添加一行,并为每个字段插入<代码>不适用。最后的产出将是:

我希望通过以下方式添加这些行:

# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)
问题在于交叉表的结果是多索引的。我尝试使用
df=df.stack([0]).reset_index()
将交叉表输出展平,但这完全改变了结构,完全偏离了最终输出。我现在得到
TypeError:Expected tuple,得到str


任何帮助都将不胜感激。

您可以尝试
reindex

#here E and F (l) you can get it by 

cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))

Hi@BEN_YO. 这是可行的,但它是硬编码的。所有客户机的列表都是动态的,每天都可能发生变化。这就是为什么我会通过new_index=tuple(list(dfAll_clients.All_Client))@PeterLucas动态地将客户机分配到一个列表中,因为我也添加了动态方式啊,好的,检查一下。谢谢你,伙计。这让我头疼。嗨,BEN_-YO。当我申请实时数据时,有一个小问题。附加的客户端创建新列。如果您将上面的代码从A更新为AAA,从B更新为BBB,从C更新为CCC…从F更新为FFF,您将看到输出