Python 熊猫:列上的多索引丢失后从宽到长
考虑通常的Python 熊猫:列上的多索引丢失后从宽到长,python,pandas,Python,Pandas,考虑通常的交易数据帧: trades = pd.DataFrame({ 'time': pd.to_datetime(['20160525 13:30:00.023', '20160525 13:30:00.038', '20160525 13:30:00.048', '20160525 13:30:00.048',
交易
数据帧:
trades = pd.DataFrame({
'time': pd.to_datetime(['20160525 13:30:00.023',
'20160525 13:30:00.038',
'20160525 13:30:00.048',
'20160525 13:30:00.048',
'20160525 13:30:00.048']),
'ticker': ['MSFT', 'MSFT','GOOG', 'BOOB', 'AAPL'],
'price': [51.95, 51.95, 720.77, 720.92, 98.00],
'quantity': [75, 155, 100, 100, 100]},
columns=['time', 'ticker', 'price', 'quantity'])
trades
Out[42]:
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 GOOG 720.77 100
3 2016-05-25 13:30:00.048 BOOB 720.92 100
4 2016-05-25 13:30:00.048 AAPL 98.00 100
在这里,我想从长到宽进行重塑,做一些事情,然后从宽到长进行重塑
从长到宽很容易
trades.set_index(['time','ticker'], inplace = True)
trades = trades.unstack()
trades
Out[44]:
price quantity \
ticker AAPL BOOB GOOG MSFT AAPL BOOB GOOG
time
2016-05-25 13:30:00.023 NaN NaN NaN 51.95 NaN NaN NaN
2016-05-25 13:30:00.038 NaN NaN NaN 51.95 NaN NaN NaN
2016-05-25 13:30:00.048 98.0 720.92 720.77 NaN 100.0 100.0 100.0
ticker MSFT
time
2016-05-25 13:30:00.023 75.0
2016-05-25 13:30:00.038 155.0
2016-05-25 13:30:00.048 NaN
但是现在出于许多原因,我确实希望在列上使用此多索引,因此我有以下代码:
trades.columns=['_'.join(t) for t in trades.columns]
这基本上摆脱了多索引,允许我处理普通列。现在的数据如下所示:
trades
Out[47]:
price_AAPL price_BOOB price_GOOG price_MSFT \
time
2016-05-25 13:30:00.023 NaN NaN NaN 51.95
2016-05-25 13:30:00.038 NaN NaN NaN 51.95
2016-05-25 13:30:00.048 98.0 720.92 720.77 NaN
quantity_AAPL quantity_BOOB quantity_GOOG \
time
2016-05-25 13:30:00.023 NaN NaN NaN
2016-05-25 13:30:00.038 NaN NaN NaN
2016-05-25 13:30:00.048 100.0 100.0 100.0
quantity_MSFT
time
2016-05-25 13:30:00.023 75.0
2016-05-25 13:30:00.038 155.0
2016-05-25 13:30:00.048 NaN
问题是:现在如何回到长格式?您可以使用:
#create MultiIndex from columns
trades.columns = trades.columns.str.split('_', expand=True)
#stack and set index names for new column names
trades = trades.stack().rename_axis(['time','ticker']).reset_index()
#convert to int
trades.quantity = trades.quantity.astype(int)
print (trades)
time ticker price quantity
0 2016-05-25 13:30:00.023 MSFT 51.95 75
1 2016-05-25 13:30:00.038 MSFT 51.95 155
2 2016-05-25 13:30:00.048 AAPL 98.00 100
3 2016-05-25 13:30:00.048 BOOB 720.92 100
4 2016-05-25 13:30:00.048 GOOG 720.77 100
魔术我想这里的关键是
expand=True
参数?是的,然后创建了MultiIndex
它是在[38]中的-中实现的:
nice,我能问你一个简单的后续问题吗?有些列看起来像col\u quantity\u jezrael
,其中jezrael
是库存,col\u quantity
对应于上面的数量。有没有办法让split
在这里正常工作(即将列分为col\u quantity
和jezrael
)?然后需要trades.columns=trades.columns.str.rsplit('''uu',expand=True,n=1)