Python 在三列上汇总,在不同条件下汇总,然后合并

Python 在三列上汇总,在不同条件下汇总,然后合并,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我的测试数据帧: print("Create List", end='\n') Test_Data = [('display_name', ['A', 'B', 'B','C','C','C','C','C',]), ('security_type1', ['GOVT', 'CORP','CORP','CORP','CORP','CORP','CORP','CORP']), ('currency_str', ['USD', 'NZD','USD','EUR

我的测试数据帧:

print("Create List",  end='\n') 
Test_Data = [('display_name', ['A', 'B', 'B','C','C','C','C','C',]),
         ('security_type1', ['GOVT', 'CORP','CORP','CORP','CORP','CORP','CORP','CORP']),
         ('currency_str', ['USD', 'NZD','USD','EUR','EUR','GBP','GBP','USD']),
         ('state', ['Done','Passed','Done','Done','Traded Away','Done','Done','Done']),
         ('rfq_qty_CAD_Equiv', [100000, 100000, 100000,100000,100000,100000,100000,100000]),
         ]
dfTest_Data = pd.DataFrame.from_items(Test_Data)
display(dfTest_Data)

display_name    security_type1  currency_str    state   rfq_qty_CAD_Equiv
A                     GOVT          USD          Done         100000
B                     CORP          NZD          Passed       100000
B                     CORP          USD          Done         100000
C                     CORP          EUR          Done         100000
C                     CORP          EUR          Traded Away  100000
C                     CORP          GBP          Done         100000
C                     CORP          GBP          Done         100000
C                     CORP          USD          Done         100000
我想要的结果如下。这里的驱动程序在
display\u name
security\u type1
currency\u str
上分组<代码>总RFQ和
总RFQ
显示名称

display_name    security_type1  currency_str    Done_RFQ    Not_Done_RFQ    Total_RFQ
      A               GOVT           USD            1             0          1  
      B               CORP           USD            1             1          2
      C               CORP           EUR            1             1          5
      C               CORP           GBP            2             0          5
      C               CORP           USD            1             0          5

Hit_Rate    Done_RFQ_Volume     Not_Done_RFQ_Volume     Total_RFQ_Volume
1.00             100000                 0                    100000              
0.50             100000               100000                 200000 
0.20             100000               100000                 500000 
0.40             200000                 0                    500000 
0.20             100000                 0                    500000 


Volume_per_Done_RFQ           Volume_per_Not_Done_RFQ   Volume_per_Total_RFQ
    100000                              0                  100000
    100000                           100000                100000
    100000                           100000                100000
    100000                              0                  100000
    100000                              0                  100000
命中率
=
完成\u询价
/
总询价

Volume\u per\u Done\u RFQ
=
Done\u RFQ\u Volume
/
Done\u RFQ

Volume\u per\u Not\u Done\u RFQ
=
Not\u Done\u RFQ\u Volume
/
Not\u Done\u RFQ

Volume\u per\u Total\u RFQ
=
Total\u RFQ\u Volume
/
Total\u RFQ

大部分工作已经完成,我只是在合并第三个数据帧和显示未完成行项目时遇到了问题,其中需要零个条目

print("All Trades",  end='\n') 
d = [
     ('Total_RFQ_Volume', 'sum'), 
     ('Total_RFQ', 'size'), 
    ]
df1 = dfTest_Data.groupby(['display_name'])['rfq_qty_CAD_Equiv'].agg(d)
display (df1)

print("Done Trades",  end='\n') 
d = [
     ('Done_RFQ_Volume', 'sum'), 
     ('Done_RFQ', 'size'), 
    ]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d).reset_index()
display (df2)

print("Not Done Trades",  end='\n') 
d = [ 
    ('Not_Done_RFQ_Volume', 'sum'), 
    ('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'] .agg(d) .reset_index()
display (df3)

print("Join Done trades on All Trades",  end='\n') 
df_Done_Client_Hit_Rate_Volume = df2.join(df1, on='display_name').join(df3, on='display_name')
# Create additional calculated columns
df_Done_Client_Hit_Rate_Volume['Hit_Rate'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Not_Done_RFQ'] = df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Not_Done_RFQ'] 
df_Done_Client_Hit_Rate_Volume['Volume_per_Total_RFQ'] = df_Done_Client_Hit_Rate_Volume['Total_RFQ_Volume'] / df_Done_Client_Hit_Rate_Volume['Total_RFQ'] 
# Reorder columns
df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume[['display_name', 
                                                                 'security_type1',
                                                                 'currency_str', 
                                                                 'Done_RFQ',
                                                                 'Not_Done_RFQ',
                                                                 'Total_RFQ',
                                                                 'Hit_Rate',
                                                                 'Done_RFQ_Volume',
                                                                 'Not_Done_RFQ_Volume',
                                                                 'Volume_per_Done_RFQ',
                                                                 'Volume_per_Not_Done_RFQ',
                                                                 'Total_RFQ_Volume'
                                                                 'Volume_per_Total_RFQ'
                                                                ]]
display (df_Done_Client_Hit_Rate_Volume)

我认为需要首先删除
df3
.reset\u index

d = [
     ('Done_RFQ_Volume', 'sum'), 
     ('Done_RFQ', 'size'), 
    ]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d)
#print (df2)

print("Not Done Trades",  end='\n') 
d = [ 
    ('Not_Done_RFQ_Volume', 'sum'), 
    ('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d) 
然后通过和将数据帧连接在一起:

df = pd.concat([df2, df3],axis=1).reset_index()
df_Done_Client_Hit_Rate_Volume = df.join(df1, on='display_name')

最后用
0
替换缺少的值:

df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume.fillna(0)
print (df_Done_Client_Hit_Rate_Volume)

  display_name security_type1 currency_str  Done_RFQ_Volume  Done_RFQ  \
0            A           GOVT          USD         100000.0       1.0   
1            B           CORP          NZD              0.0       0.0   
2            B           CORP          USD         100000.0       1.0   
3            C           CORP          EUR         100000.0       1.0   
4            C           CORP          GBP         200000.0       2.0   
5            C           CORP          USD         100000.0       1.0   

   Not_Done_RFQ_Volume  Not_Done_RFQ  Total_RFQ_Volume  Total_RFQ  Hit_Rate  \
0                  0.0           0.0            100000          1       1.0   
1             100000.0           1.0            200000          2       0.0   
2                  0.0           0.0            200000          2       0.5   
3             100000.0           1.0            500000          5       0.2   
4                  0.0           0.0            500000          5       0.4   
5                  0.0           0.0            500000          5       0.2   

   Volume_per_Done_RFQ  Volume_per_Not_Done_RFQ  Volume_per_Total_RFQ  
0             100000.0                      0.0              100000.0  
1                  0.0                 100000.0              100000.0  
2             100000.0                      0.0              100000.0  
3             100000.0                 100000.0              100000.0  
4             100000.0                      0.0              100000.0  
5             100000.0                      0.0              100000.0  

我认为需要首先删除
df3
.reset\u index

d = [
     ('Done_RFQ_Volume', 'sum'), 
     ('Done_RFQ', 'size'), 
    ]
mask = dfTest_Data['state'].str.contains('Done')
df2 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d)
#print (df2)

print("Not Done Trades",  end='\n') 
d = [ 
    ('Not_Done_RFQ_Volume', 'sum'), 
    ('Not_Done_RFQ', 'size'), 
] 
mask = ~dfTest_Data['state'].str.contains('Done') 
df3 = dfTest_Data[mask].groupby(['display_name','security_type1','currency_str'])['rfq_qty_CAD_Equiv'].agg(d) 
然后通过和将数据帧连接在一起:

df = pd.concat([df2, df3],axis=1).reset_index()
df_Done_Client_Hit_Rate_Volume = df.join(df1, on='display_name')

最后用
0
替换缺少的值:

df_Done_Client_Hit_Rate_Volume = df_Done_Client_Hit_Rate_Volume.fillna(0)
print (df_Done_Client_Hit_Rate_Volume)

  display_name security_type1 currency_str  Done_RFQ_Volume  Done_RFQ  \
0            A           GOVT          USD         100000.0       1.0   
1            B           CORP          NZD              0.0       0.0   
2            B           CORP          USD         100000.0       1.0   
3            C           CORP          EUR         100000.0       1.0   
4            C           CORP          GBP         200000.0       2.0   
5            C           CORP          USD         100000.0       1.0   

   Not_Done_RFQ_Volume  Not_Done_RFQ  Total_RFQ_Volume  Total_RFQ  Hit_Rate  \
0                  0.0           0.0            100000          1       1.0   
1             100000.0           1.0            200000          2       0.0   
2                  0.0           0.0            200000          2       0.5   
3             100000.0           1.0            500000          5       0.2   
4                  0.0           0.0            500000          5       0.4   
5                  0.0           0.0            500000          5       0.2   

   Volume_per_Done_RFQ  Volume_per_Not_Done_RFQ  Volume_per_Total_RFQ  
0             100000.0                      0.0              100000.0  
1                  0.0                 100000.0              100000.0  
2             100000.0                      0.0              100000.0  
3             100000.0                 100000.0              100000.0  
4             100000.0                      0.0              100000.0  
5             100000.0                      0.0              100000.0  

它在耶斯雷尔非常有效。我想弄明白这件事,差点把垫圈炸了。谢谢!!它在耶斯雷尔非常有效。我想弄明白这件事,差点把垫圈炸了。谢谢!!