Python 访问多索引中的一个级别_Python_Pandas_Multi Index

Python 访问多索引中的一个级别

python pandas

Python 访问多索引中的一个级别,python,pandas,multi-index,Python,Pandas,Multi Index,我有一个数据框架，它看起来像是一个多索引的简单用例：我有ISO周数和日期作为索引，我想按特定的周进行过滤。按照中的说明，看起来我应该能够通过传递一个星期号字符串来建立索引。但是，这给了我一个关键错误 MCVE：通常，对于选择多索引使用：或loc： #no parameter if select first level print (df.loc['2016_32']) #if want select second level axis=0 and : for select all value

我有一个数据框架，它看起来像是一个多索引的简单用例：我有ISO周数和日期作为索引，我想按特定的周进行过滤。按照中的说明，看起来我应该能够通过传递一个星期号字符串来建立索引。但是，这给了我一个关键错误

MCVE：

通常，对于选择

多索引

使用：

或

loc

：

#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])

print (df.loc['A'])
  bar     baz     foo     qux    
  one two one two one two one two
E   8   1   5   8   3   5   3   3
F   3   1   3   6   6   1   0   2

print (df.loc['A'].loc['F'])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: F, dtype: int32

print (df.loc[('A', 'F')])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: (A, F), dtype: int32

列和行中的多索引中选择的差异：

np.random.seed(235)
a = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']),
          np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
a1 = pd.MultiIndex.from_product([['A', 'B', 'C'], ['E','F']])
df = pd.DataFrame(np.random.randint(10, size=(6, 8)), index=a1, columns=a)
print (df)
    bar     baz     foo     qux    
    one two one two one two one two
A E   8   1   5   8   3   5   3   3
  F   3   1   3   6   6   1   0   2
B E   0   3   1   7   0   0   8   2
  F   6   7   7   4   2   7   7   5
C E   7   3   1   7   3   9   7   3
  F   8   2   0   8   5   2   2   0

对于按行选择（索引中的多索引），请使用

loc

：

#no parameter if select first level
print (df.loc['2016_32'])
#if want select second level axis=0 and : for select all values of first level
print (df.loc(axis=0)[:, '2016-09-07'])

print (df.loc['A'])
  bar     baz     foo     qux    
  one two one two one two one two
E   8   1   5   8   3   5   3   3
F   3   1   3   6   6   1   0   2

print (df.loc['A'].loc['F'])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: F, dtype: int32

print (df.loc[('A', 'F')])
bar  one    3
     two    1
baz  one    3
     two    6
foo  one    6
     two    1
qux  one    0
     two    2
Name: (A, F), dtype: int32

或者，您可以在不更改顺序的情况下使用交换级别：

>>> df[:7].swaplevel(0, 0, axis=0)
                         foo  bar
2016_32 2016-08-07  0.142857  NaN
        2016-08-08  0.142857  NaN
        2016-08-09  0.142857  NaN
        2016-08-10  0.142857  NaN
        2016-08-11  0.142857  NaN
        2016-08-12  0.142857  NaN
        2016-08-13  0.142857  NaN

或者简单地说：

>>> df[1:7]
                         foo  bar
2016_32 2016-08-08  0.142857  NaN
        2016-08-09  0.142857  NaN
        2016-08-10  0.142857  NaN
        2016-08-11  0.142857  NaN
        2016-08-12  0.142857  NaN
        2016-08-13  0.142857  NaN

谢谢，这确实有效！你能解释一下这与文档中的不同之处吗？@Josh Friedlander-我认为有区别

df['col']

在

MultiIndex

列中选择第一级，在你的示例中是

MulitIndex

在

index

中，所以需要

loc

df.xs

是一种横截面方法，xs（）DataFrame的方法还使用了一个级别参数，使在多索引的特定级别选择数据变得更容易。@jezrael，不错的一个！知道了。在文档中，示例术语

bar

既是列名又是级别名，因此我感到困惑

>>> df[1:7]
                         foo  bar
2016_32 2016-08-08  0.142857  NaN
        2016-08-09  0.142857  NaN
        2016-08-10  0.142857  NaN
        2016-08-11  0.142857  NaN
        2016-08-12  0.142857  NaN
        2016-08-13  0.142857  NaN