Python 熊猫:列上的多索引丢失后从宽到长

Python 熊猫:列上的多索引丢失后从宽到长,python,pandas,Python,Pandas,考虑通常的交易数据帧: trades = pd.DataFrame({ 'time': pd.to_datetime(['20160525 13:30:00.023', '20160525 13:30:00.038', '20160525 13:30:00.048', '20160525 13:30:00.048',

考虑通常的
交易
数据帧:

trades = pd.DataFrame({
         'time': pd.to_datetime(['20160525 13:30:00.023',
                      '20160525 13:30:00.038',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048',
                      '20160525 13:30:00.048']),
         'ticker': ['MSFT', 'MSFT','GOOG', 'BOOB', 'AAPL'],
         'price': [51.95, 51.95, 720.77, 720.92, 98.00],
         'quantity': [75, 155, 100, 100, 100]},
        columns=['time', 'ticker', 'price', 'quantity'])

trades
Out[42]: 
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   GOOG  720.77       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   AAPL   98.00       100
在这里,我想从长到宽进行重塑,做一些事情,然后从宽到长进行重塑

从长到宽很容易

trades.set_index(['time','ticker'], inplace = True)
trades = trades.unstack()

trades
Out[44]: 
                        price                        quantity                \
ticker                   AAPL    BOOB    GOOG   MSFT     AAPL   BOOB   GOOG   
time                                                                          
2016-05-25 13:30:00.023   NaN     NaN     NaN  51.95      NaN    NaN    NaN   
2016-05-25 13:30:00.038   NaN     NaN     NaN  51.95      NaN    NaN    NaN   
2016-05-25 13:30:00.048  98.0  720.92  720.77    NaN    100.0  100.0  100.0   


ticker                    MSFT  
time                            
2016-05-25 13:30:00.023   75.0  
2016-05-25 13:30:00.038  155.0  
2016-05-25 13:30:00.048    NaN  
但是现在出于许多原因,我确实希望在列上使用此多索引,因此我有以下代码:

trades.columns=['_'.join(t) for t in trades.columns]
这基本上摆脱了多索引,允许我处理普通列。现在的数据如下所示:

trades
Out[47]: 
                         price_AAPL  price_BOOB  price_GOOG  price_MSFT  \
time                                                                      
2016-05-25 13:30:00.023         NaN         NaN         NaN       51.95   
2016-05-25 13:30:00.038         NaN         NaN         NaN       51.95   
2016-05-25 13:30:00.048        98.0      720.92      720.77         NaN   

                         quantity_AAPL  quantity_BOOB  quantity_GOOG  \
time                                                                   
2016-05-25 13:30:00.023            NaN            NaN            NaN   
2016-05-25 13:30:00.038            NaN            NaN            NaN   
2016-05-25 13:30:00.048          100.0          100.0          100.0   

                         quantity_MSFT  
time                                    
2016-05-25 13:30:00.023           75.0  
2016-05-25 13:30:00.038          155.0  
2016-05-25 13:30:00.048            NaN 
问题是:现在如何回到长格式?

您可以使用:

#create MultiIndex from columns 
trades.columns = trades.columns.str.split('_', expand=True)
#stack and set index names for new column names
trades = trades.stack().rename_axis(['time','ticker']).reset_index()
#convert to int
trades.quantity = trades.quantity.astype(int)
print (trades)
                     time ticker   price  quantity
0 2016-05-25 13:30:00.023   MSFT   51.95        75
1 2016-05-25 13:30:00.038   MSFT   51.95       155
2 2016-05-25 13:30:00.048   AAPL   98.00       100
3 2016-05-25 13:30:00.048   BOOB  720.92       100
4 2016-05-25 13:30:00.048   GOOG  720.77       100

魔术我想这里的关键是
expand=True
参数?是的,然后创建了
MultiIndex
它是在[38]中的-
中实现的:
nice,我能问你一个简单的后续问题吗?有些列看起来像
col\u quantity\u jezrael
,其中
jezrael
是库存,
col\u quantity
对应于上面的
数量。有没有办法让
split
在这里正常工作(即将列分为
col\u quantity
jezrael
)?然后需要
trades.columns=trades.columns.str.rsplit('''uu',expand=True,n=1)