Python 将级别前置到索引
我有一个数据框,在分组后创建了一个多索引:Python 将级别前置到索引,python,pandas,Python,Pandas,我有一个数据框,在分组后创建了一个多索引: import numpy as np import pandas as p from numpy.random import randn df = p.DataFrame({ 'A' : ['a1', 'a1', 'a2', 'a3'] , 'B' : ['b1', 'b2', 'b3', 'b4'] , 'Vals' : randn(4) }).groupby(['A', 'B']).sum() df Output>
import numpy as np
import pandas as p
from numpy.random import randn
df = p.DataFrame({
'A' : ['a1', 'a1', 'a2', 'a3']
, 'B' : ['b1', 'b2', 'b3', 'b4']
, 'Vals' : randn(4)
}).groupby(['A', 'B']).sum()
df
Output> Vals
Output> A B
Output> a1 b1 -1.632460
Output> b2 0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009
如何为多重索引预先设置一个级别,以便将其转换为以下内容:
Output> Vals
Output> FirstLevel A B
Output> Foo a1 b1 -1.632460
Output> b2 0.596027
Output> a2 b3 -0.619130
Output> a3 b4 -0.002009
您可以先将其添加为普通列,然后将其附加到当前索引中,以便:
df['Firstlevel'] = 'Foo'
df.set_index('Firstlevel', append=True, inplace=True)
如果需要,可通过以下方式更改顺序:
df.reorder_levels(['Firstlevel', 'A', 'B'])
其结果是:
Vals
Firstlevel A B
Foo a1 b1 0.871563
b2 0.494001
a2 b3 -0.167811
a3 b4 -1.353409
使用
pandas.concat()
在一行中完成此操作的好方法:
更简短的方式:
pd.concat({'Foo': df}, names=['Firstlevel'])
这可以推广到许多数据帧,请参见。我认为这是一个更通用的解决方案:
# Convert index to dataframe
old_idx = df.index.to_frame()
# Insert new level at specified location
old_idx.insert(0, 'new_level_name', new_level_values)
# Convert back to MultiIndex
df.index = pandas.MultiIndex.from_frame(old_idx)
与其他答案相比,有一些优势:
- 新关卡可以添加到任何位置,而不仅仅是顶部
- 这纯粹是对索引的操作,不需要像串联技巧那样操作数据李>
- 它不需要添加列作为中间步骤,这可能会破坏多级列索引
to_frame()
方法将为没有索引级别的索引级别创建新名称。因此,新索引将具有旧索引中不存在的名称。我添加了一些代码来恢复此名称更改
下面是代码,我自己已经使用了一段时间,它似乎工作得很好。如果您发现任何问题或边缘案例,我将非常有义务调整我的答案
import pandas as pd
def _handle_insert_loc(loc: int, n: int) -> int:
"""
Computes the insert index from the right if loc is negative for a given size of n.
"""
return n + loc + 1 if loc < 0 else loc
def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
"""
Expand a (multi)index by adding a level to it.
:param old_index: The index to expand
:param name: The name of the new index level
:param value: Scalar or list-like, the values of the new index level
:param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
:return: A new multi-index with the new level added
"""
loc = _handle_insert_loc(loc, len(old_index.names))
old_index_df = old_index.to_frame()
old_index_df.insert(loc, name, value)
new_index_names = list(old_index.names) # sometimes new index level names are invented when converting to a df,
new_index_names.insert(loc, name) # here the original names are reconstructed
new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
return new_index
用它从头开始建造怎么样
与之类似,这是一种灵活的方法,可以避免修改数据帧的底层数组。如果对具有多索引列索引的数据帧执行此操作,则会添加级别,在大多数情况下这可能并不重要,但可能会,如果您依赖元数据来做其他事情。这对于通过添加
axis=1
向列添加一个级别特别好,因为df.columns
没有像索引那样的“set\u index”方法,这总是让我感到不舒服。这很好,因为它也适用于pd.Series
对象,然而,目前接受的答案(2013年起)不再有效。TypeError:Unhabable type:“list”我花了一段时间才意识到,如果您在['Foo',Bar']
中为FirstLevel
有多个键,那么第一个参数也需要有相应的长度,即[df]*len(['Foo',Bar'])
!更简洁的是:pd.concat({'Foo':df},names=['Firstlevel'])
import pandas as pd
def _handle_insert_loc(loc: int, n: int) -> int:
"""
Computes the insert index from the right if loc is negative for a given size of n.
"""
return n + loc + 1 if loc < 0 else loc
def add_index_level(old_index: pd.Index, value: Any, name: str = None, loc: int = 0) -> pd.MultiIndex:
"""
Expand a (multi)index by adding a level to it.
:param old_index: The index to expand
:param name: The name of the new index level
:param value: Scalar or list-like, the values of the new index level
:param loc: Where to insert the level in the index, 0 is at the front, negative values count back from the rear end
:return: A new multi-index with the new level added
"""
loc = _handle_insert_loc(loc, len(old_index.names))
old_index_df = old_index.to_frame()
old_index_df.insert(loc, name, value)
new_index_names = list(old_index.names) # sometimes new index level names are invented when converting to a df,
new_index_names.insert(loc, name) # here the original names are reconstructed
new_index = pd.MultiIndex.from_frame(old_index_df, names=new_index_names)
return new_index
import unittest
import numpy as np
import pandas as pd
class TestPandaStuff(unittest.TestCase):
def test_add_index_level(self):
df = pd.DataFrame(data=np.random.normal(size=(6, 3)))
i1 = add_index_level(df.index, "foo")
# it does not invent new index names where there are missing
self.assertEqual([None, None], i1.names)
# the new level values are added
self.assertTrue(np.all(i1.get_level_values(0) == "foo"))
self.assertTrue(np.all(i1.get_level_values(1) == df.index))
# it does not invent new index names where there are missing
i2 = add_index_level(i1, ["x", "y"]*3, name="xy", loc=2)
i3 = add_index_level(i2, ["a", "b", "c"]*2, name="abc", loc=-1)
self.assertEqual([None, None, "xy", "abc"], i3.names)
# the new level values are added
self.assertTrue(np.all(i3.get_level_values(0) == "foo"))
self.assertTrue(np.all(i3.get_level_values(1) == df.index))
self.assertTrue(np.all(i3.get_level_values(2) == ["x", "y"]*3))
self.assertTrue(np.all(i3.get_level_values(3) == ["a", "b", "c"]*2))
# df.index = i3
# print()
# print(df)
df.index = p.MultiIndex.from_tuples(
[(nl, A, B) for nl, (A, B) in
zip(['Foo'] * len(df), df.index)],
names=['FirstLevel', 'A', 'B'])