Python 如何使用ix索引到多索引中_Python_Pandas

Python 如何使用ix索引到多索引中

python pandas

Python 如何使用ix索引到多索引中,python,pandas,Python,Pandas,我已经设置了如下代码： import pandas as pd import numpy as np arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'], ['aaa', 'bbb', 'ccc', 'ccc', 'ddd', 'eee', 'eee', 'ee

我已经设置了如下代码：

import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'],
          ['aaa', 'bbb', 'ccc', 'ccc', 'ddd', 'eee', 'eee', 'eee' ]]
tuples = zip(*arrays)
index = pd.MultiIndex.from_tuples(tuples, names=['A', 'B', 'C'])
df = pd.DataFrame(np.random.randn(8, 4), index=index)

df
Out[161]: 
                    0         1         2         3
A   B   C                                          
bar one aaa  0.682220 -0.598889 -0.600635 -0.488069
    two bbb -0.134557  1.614224 -0.191303  0.073813
baz one ccc -1.006877 -0.137264 -0.319274  1.465952
    two ccc  0.107222  0.358468  0.165108 -0.258715
foo one ddd  0.360562  1.759095 -1.385394 -0.646850
    two eee -1.113520  0.221483  2.226704 -0.994636
qux one eee -0.609271 -0.888330  0.824189  1.772536
    two eee -0.008346 -0.688091  0.263303  1.242485

我希望根据A、B和C组的条件组合查找匹配行

e、 g.在sql术语中：选择*其中A在（'foo'，'qux'）和C='eee'

我可以用ix实现这个吗？e、 g.类似于：

df.ix(['foo', 'qux'],:,'eee')

对于非常大的数据集，实现这一点的idomatic方法是什么

（我目前正在使用pandas 0.7，但如果绝对必要，可以升级）

，您可以将一组选择器传递到

df.loc

以分割多索引：

In [782]: df.loc[(['foo','qux'], slice(None), 'eee'), :]
Out[782]: 
                    0         1         2         3
A   B   C                                          
foo two eee  1.615853 -1.327626  0.772608 -0.406398
qux one eee  0.472890  0.746417  0.095389 -1.446869
    two eee  0.711915  0.228739  1.763126  0.558106

我将编写一个函数来完成这类事情：

import numpy as np
def ms(df, *args):
    idx = df.index
    for i, values in enumerate(args):
        if values is not None:
            if np.isscalar(values):
                values = [values]
            idx = idx.reindex(values, level=i)[0]
    return df.ix[idx]

那么你就可以很容易地做到：

ms(df, ['foo', 'qux'], None, "eee")

这伤了我的头。。。。我能找到的最接近的是

df.ix['foo'：'qux'].xs（'eee'，level='C'）

…或者

df.ix['foo'，'qux']].xs（'eee'，level='C'）

这真是太棒了。非常感谢。

ms

代表什么？将

if not isinstance（值，（元组，列表））

更改为

if isinstance（值，基串）

，您感觉如何？

ms

是

multi-ndexselect

。因为索引中的值可能是int、float或datetime，所以通常不需要只检查基串。我认为最好使用

而不是isinstance（values，collections.Iterable）

，因为Pandas可以将Iterable对象转换为Index。由于str也是Iterable，所以我将它改为使用

numpy.isscalar（values）

，在这种情况下，使用reindex是否等同于使用xs？您对

返回df.reindex（idx）有何感想

而不是

返回df.ix[idx]

？