Python 从字典创建多索引/分层数据框架

Python 从字典创建多索引/分层数据框架,python,pandas,Python,Pandas,假设我有以下词典: multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []} column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []} 如何使用这些字典创建多索引数据帧 应该是这样的: index_1 index_2 column_data_1 foo A 2

假设我有以下词典:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3],    'baz': []}
如何使用这些字典创建多索引数据帧

应该是这样的:

index_1  index_2     column_data_1
foo      A           2
         B           4
         C           5
bar      X           2
         Y           3
baz      np.NaN      np.NaN 
注: 如果Pandas不支持
NaN
索引,我们可以删除上面字典中的空条目


理想情况下,我希望DataFrame能够捕捉到这样一个事实:如果可能的话,这些条目会丢失。然而,最重要的是能够使用
多级索引中的索引对数据帧进行索引
使用
concat

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())
结果:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64
此外,正如@CT Zhu所提到的,在
baz
的定义中,如果将
[]
更改为
[None]
,则可以跟踪这些条目:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object

使用
concat

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())
结果:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64
此外,正如@CT Zhu所提到的,在
baz
的定义中,如果将
[]
更改为
[None]
,则可以跟踪这些条目:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object

使用
concat

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())
结果:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64
此外,正如@CT Zhu所提到的,在
baz
的定义中,如果将
[]
更改为
[None]
,则可以跟踪这些条目:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object

使用
concat

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': []}

pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
          keys=multilevel_indices.keys())
结果:

foo  A    2
     B    4
     C    5
bar  X    2
     Y    3
dtype: float64
此外,正如@CT Zhu所提到的,在
baz
的定义中,如果将
[]
更改为
[None]
,则可以跟踪这些条目:

baz  NaN    None
foo  A         2
     B         4
     C         5
bar  X         2
     Y         3
dtype: object

您拥有的原始数据集可能不会产生
nan
索引,但稍加更改即可

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]

您拥有的原始数据集可能不会产生
nan
索引,但稍加更改即可

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]

您拥有的原始数据集可能不会产生
nan
索引,但稍加更改即可

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]

您拥有的原始数据集可能不会产生
nan
索引,但稍加更改即可

In [137]:

multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}    
column_data_1      = {'foo': [2, 4, 5],       'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
                     labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
                             range(sum(map(len, multilevel_indices.values())))],
                     names=['index_1',  'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])


                 column_data_1
index_1 index_2               
baz     NaN                NaN
foo     A                    2
        B                    4
        C                    5
bar     X                    2
        Y                    3

[6 rows x 1 columns]