Python 将多列索引应用于数据帧_Python_Pandas_Indexing

Python 将多列索引应用于数据帧

python pandas indexing

Python 将多列索引应用于数据帧,python,pandas,indexing,Python,Pandas,Indexing,情况是，我有几个文件，其中包含多个字段的各种股票的时间序列数据。每个文件包含 time, open, high, low, close, volume 我们的目标是将所有这些都放在表单的一个数据帧中 field open high ... security hk_1 hk_2 hk_3 ... hk_1 hk_2 hk

情况是，我有几个文件，其中包含多个字段的各种股票的时间序列数据。每个文件包含

time, open, high, low, close, volume

我们的目标是将所有这些都放在表单的一个数据帧中

field      open                              high                            ...
security    hk_1      hk_2      hk_3 ...      hk_1      hk_2      hk_3 ...  ...
time
t_1      open_1_1  open_2_1  open_3_1 ...  high_1_1  high_2_1  high_3_1 ...  ...            
t_2      open_1_2  open_2_2  open_3_2 ...  high_1_2  high_2_2  high_3_2 ...  ...
...        ...        ...       ... ...       ...       ...       ... ...  ...

我创建了一个多重索引

fields = ['time','open','high','low','close','volume','numEvents','value']
midx = pd.MultiIndex.from_product([security_name'], fields], names=['security', 'field'])

首先，尝试将该多索引应用于从csv读取数据得到的数据帧（通过创建新的数据帧并添加索引）

但是，新的数据帧仅包含nan

security    1_HK
field       time    open    high    low     close   volume
time                                
 NaN         NaN     NaN     NaN    NaN       NaN      NaN

此外，它仍然包含一个时间列，尽管我尝试将其作为索引（以便以后可以通过索引将其他股票的所有其他数据帧连接起来，以获得聚合的数据帧）

如何在不丢失数据的情况下将多索引应用于数据帧，然后像这样加入数据帧

security    1_HK
field       time    open    high    low     close   volume
time

创建类似的内容（注意层次结构字段和安全性已切换）

我认为您可以首先将所有文件列在

文件中，然后使用列表理解功能将所有数据帧和它们按列排列（axis=1）
。如果添加参数键
，将在列中获得多索引
：
档案：
,，
, 

最后需要和：
谢谢那很好用。注意，任何人谁是看这个；如果您这样读取数据，则文件的顺序应与eqty\u names\u列表中的顺序相匹配
security    1_HK
field       time    open    high    low     close   volume
time

field       time                open    high        ...
security    1_HK    2_HK ...    1_HK    2_HK ...    ...
time

import pandas as pd
import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp) for fp in files]

eqty_names_list = ['hk1','hk2','hk3']
df = pd.concat(dfs, keys=eqty_names_list, axis=1)

print (df)
  hk1       hk2       hk3      
    a  b  c   a  b  c   a  b  c
0   0  1  2   0  9  6   0  7  1
1   1  5  8   1  6  4   1  3  2

df.columns = df.columns.swaplevel(0,1)
df = df.sort_index(axis=1)
print (df)
    a           b           c        
  hk1 hk2 hk3 hk1 hk2 hk3 hk1 hk2 hk3
0   0   0   0   1   9   7   2   6   1
1   1   1   1   5   6   3   8   4   2