Python 2.7 重命名多索引数据帧中的索引值_Python 2.7_Pandas

Python 2.7 重命名多索引数据帧中的索引值

python-2.7 pandas

Python 2.7 重命名多索引数据帧中的索引值,python-2.7,pandas,Python 2.7,Pandas,创建我的数据帧： from pandas import * arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = zip(*arrays) index = MultiIndex.from_tuples(tuples, names=['first','secon

创建我的数据帧：

from pandas import *
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

tuples = zip(*arrays)

index = MultiIndex.from_tuples(tuples, names=['first','second'])
data = DataFrame(randn(8,2),index=index,columns=['c1','c2'])

data
Out[68]: 
                    c1        c2
first second                    
bar   one     0.833816 -1.529639
      two     0.340150 -1.818052
baz   one    -1.605051 -0.917619
      two    -0.021386 -0.222951
foo   one     0.143949 -0.406376
      two     1.208358 -2.469746
qux   one    -0.345265 -0.505282
      two     0.158928  1.088826

我想重命名“first”索引值，如“bar”->“cat”、“baz”->“dog”等。然而，我读过的每个示例要么在单个级别索引上运行，要么在整个索引中循环，以有效地从头开始重新创建索引。我的想法是：

data = data.reindex(index={'bar':'cat','baz':'dog'})

但这不起作用，我也不希望它在多个索引上起作用。我可以在不遍历整个数据帧索引的情况下进行这样的替换吗

开始编辑

在发布之前，我犹豫是否要更新到0.13，因此我使用了以下解决方法：

index = data.index.tolist()
for r in xrange( len(index) ):
    index[r] = (codes[index[r][0]],index[r][1])

index = pd.MultiIndex.from_tuples(index,names=data.index.names)
data.index = index

其中是先前定义的代码字典：字符串对。事实上，这并不像我预期的那么大（操作110万行需要几秒钟）。它不像一行纸那么漂亮，但它确实管用

结束编辑

使用

设置级别

方法（）：

屈服

                    c1        c2
first second                    
cat   one    -0.289649 -0.870716
      two    -0.062014 -0.410274
dog   one     0.030171 -1.091150
      two     0.505408  1.531108
foo   one     1.375653 -1.377876
      two    -1.478615  1.351428
qux   one     1.075802  0.532416
      two     0.865931 -0.765292

要基于dict重新映射标高，可以使用以下函数：

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)

dct = {'bar':'cat', 'baz':'dog'}
map_level(data, dct, level=0)

下面是一个可运行的示例：

import numpy as np
import pandas as pd

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['first','second'])
data = pd.DataFrame(np.random.randn(8,2),index=index,columns=['c1','c2'])
data2 = data.copy()

data.index.set_levels([[u'cat', u'dog', u'foo', u'qux'], 
                       [u'one', u'two']], inplace=True)
print(data)
#                     c1        c2
# first second                    
# cat   one     0.939040 -0.748100
#       two    -0.497006 -1.185966
# dog   one    -0.368161  0.050339
#       two    -2.356879 -0.291206
# foo   one    -0.556261  0.474297
#       two     0.647973  0.755983
# qux   one    -0.017722  1.364244
#       two     1.007303  0.004337

def map_level(df, dct, level=0):
    index = df.index
    index.set_levels([[dct.get(item, item) for item in names] if i==level else names
                      for i, names in enumerate(index.levels)], inplace=True)
dct = {'bar':'wolf', 'baz':'rabbit'}
map_level(data2, dct, level=0)
print(data2)
#                      c1        c2
# first  second                    
# wolf   one     0.939040 -0.748100
#        two    -0.497006 -1.185966
# rabbit one    -0.368161  0.050339
#        two    -2.356879 -0.291206
# foo    one    -0.556261  0.474297
#        two     0.647973  0.755983
# qux    one    -0.017722  1.364244
#        two     1.007303  0.004337

set\u levels

方法导致我的新列名出现问题。所以我找到了一个不同的解决方案，它不是很干净，但效果很好。该方法是

打印df.index

（或等效地

df.columns

），然后复制并粘贴更改了所需值的输出。例如：

print data.index

data.index = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'],
                                ['one', 'twooo', 'three', 'four',
                                 'five', 'siz', 'seven', 'eit']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]],
                        names=['first', 'second'])

多索引（级别=['bar'，'baz'，'foo'，'qux'，['one'，'two']]，标签=[[0,0,1,1,2,2,3,3]，[0,1,0,1,0,1]]，名称=[“第一”、“第二”]）

我们也可以通过编辑标签来完全控制名称。例如：

print data.index

data.index = MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'],
                                ['one', 'twooo', 'three', 'four',
                                 'five', 'siz', 'seven', 'eit']],
                        labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 2, 3, 4, 5, 6, 7]],
                        names=['first', 'second'])

注意，在这个例子中，我们已经做了一些类似pandas import MultiIndex的

，或者pandas import*

的

，这是pandas未来版本的增强建议：（@unutbu soln works ATM）0.13仍在开发中，我仍在运行0.12.0。是否有任何关于0.13x稳定性的指示？我没有看到太多关于.index.set_级别的文档。在上面的示例中，设置级别很简单，因为我们只有两个级别。是否可以通过一个字典来只替换一个索引中的值，而不必接触（或指定）其他轴的值？在0.16.2和0.18.1中，这对我来说很好，我也有同样的问题，设置水平会使新列名无序。我认为这是基于MultiIndex先前的“labels”参数来放置新的列名。很好的解决方法。