Python 从字典创建多索引/分层数据框架
假设我有以下词典:Python 从字典创建多索引/分层数据框架,python,pandas,Python,Pandas,假设我有以下词典: multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []} column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []} 如何使用这些字典创建多索引数据帧 应该是这样的: index_1 index_2 column_data_1 foo A 2
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []}
如何使用这些字典创建多索引数据帧
应该是这样的:
index_1 index_2 column_data_1
foo A 2
B 4
C 5
bar X 2
Y 3
baz np.NaN np.NaN
注:
如果Pandas不支持NaN
索引,我们可以删除上面字典中的空条目
理想情况下,我希望DataFrame能够捕捉到这样一个事实:如果可能的话,这些条目会丢失。然而,最重要的是能够使用
多级索引中的索引对数据帧进行索引使用concat
:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []}
pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
keys=multilevel_indices.keys())
结果:
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: float64
此外,正如@CT Zhu所提到的,在baz
的定义中,如果将[]
更改为[None]
,则可以跟踪这些条目:
baz NaN None
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: object
使用concat
:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []}
pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
keys=multilevel_indices.keys())
结果:
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: float64
此外,正如@CT Zhu所提到的,在baz
的定义中,如果将[]
更改为[None]
,则可以跟踪这些条目:
baz NaN None
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: object
使用concat
:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []}
pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
keys=multilevel_indices.keys())
结果:
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: float64
此外,正如@CT Zhu所提到的,在baz
的定义中,如果将[]
更改为[None]
,则可以跟踪这些条目:
baz NaN None
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: object
使用concat
:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': []}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': []}
pd.concat([pd.Series(column_data_1[k], index=multilevel_indices[k]) for k in multilevel_indices],
keys=multilevel_indices.keys())
结果:
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: float64
此外,正如@CT Zhu所提到的,在baz
的定义中,如果将[]
更改为[None]
,则可以跟踪这些条目:
baz NaN None
foo A 2
B 4
C 5
bar X 2
Y 3
dtype: object
您拥有的原始数据集可能不会产生nan
索引,但稍加更改即可
In [137]:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
range(sum(map(len, multilevel_indices.values())))],
names=['index_1', 'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])
column_data_1
index_1 index_2
baz NaN NaN
foo A 2
B 4
C 5
bar X 2
Y 3
[6 rows x 1 columns]
您拥有的原始数据集可能不会产生nan
索引,但稍加更改即可
In [137]:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
range(sum(map(len, multilevel_indices.values())))],
names=['index_1', 'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])
column_data_1
index_1 index_2
baz NaN NaN
foo A 2
B 4
C 5
bar X 2
Y 3
[6 rows x 1 columns]
您拥有的原始数据集可能不会产生nan
索引,但稍加更改即可
In [137]:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
range(sum(map(len, multilevel_indices.values())))],
names=['index_1', 'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])
column_data_1
index_1 index_2
baz NaN NaN
foo A 2
B 4
C 5
bar X 2
Y 3
[6 rows x 1 columns]
您拥有的原始数据集可能不会产生nan
索引,但稍加更改即可
In [137]:
multilevel_indices = {'foo': ['A', 'B', 'C'], 'bar': ['X', 'Y'], 'baz': [None]}
column_data_1 = {'foo': [2, 4, 5], 'bar': [2, 3], 'baz': [None]}
mindex=pd.MultiIndex(levels=[multilevel_indices.keys(),list(chain(*multilevel_indices.values()))],
labels=[list(chain(*[[i]*len(v[1]) for i, v in enumerate(multilevel_indices.items())])),
range(sum(map(len, multilevel_indices.values())))],
names=['index_1', 'index_2'])
print pd.DataFrame(list(chain(*column_data_1.values())), index=mindex, columns=['column_data_1'])
column_data_1
index_1 index_2
baz NaN NaN
foo A 2
B 4
C 5
bar X 2
Y 3
[6 rows x 1 columns]