Python 如何对多索引执行条件选择_Python_Pandas

Python 如何对多索引执行条件选择

python pandas

Python 如何对多索引执行条件选择,python,pandas,Python,Pandas,以下是，我在ipython笔记本中执行了以下操作： !curl -O http://pbpython.com/extras/sales-funnel.xlsx df = pd.read_excel('./sales-funnel.xlsx') df['Status'] = df['Status'].astype('category') df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace

以下是，我在ipython笔记本中执行了以下操作：

!curl -O http://pbpython.com/extras/sales-funnel.xlsx

df = pd.read_excel('./sales-funnel.xlsx')
df['Status'] = df['Status'].astype('category')
df["Status"].cat.set_categories(["won","pending","presented","declined"],inplace=True)

table = pd.pivot_table(df,
               index=['Manager', 'Status'],
               values=['Price', 'Quantity'],
               columns=['Product'],
               aggfunc={'Price':[np.sum, np.mean], 'Quantity':len},
               fill_value=0
              )

这就是

表中的数据：

我想选择（Manager==“Debra Henley”）&（Status==“won”）
，它与查询
方法一起工作：
table.query('(Manager=="Debra Henley") & (Status=="won")')

但是如何使用loc
执行相同的选择？我尝试过这个，但不起作用：
table.loc[['Debra Henley', 'won']]

在处理多索引时，你们通常使用什么？最好的方法是什么

更新：到目前为止找到了两个解决方案：
table.xs(('Debra Henley','won'), level=('Manager', 'Status'))
table.loc[[('Debra Henley', 'won')]]

因此，我想在使用多索引进行索引时，应该使用元组
而不是列表
？
是的，您可以使用：
table.loc[[('Debra Henley', 'won')]]

table.loc[('Debra Henley','won')]

要返回数据帧，或者可以使用：
table.loc[[('Debra Henley', 'won')]]

table.loc[('Debra Henley','won')]

返回熊猫系列
您可以参考文档。
有关更简单的选择（仅索引或仅列），请使用方法或通过元组进行选择
另一个更通用的解决方案包括：

但是，对于更复杂的选择，它更好一些-如果需要同时过滤索引和列-一个xs
不起作用：
idx = pd.IndexSlice
#select all rows where first level is Debra Henley in index and 
#in columns second level is len and sum
print (table.loc[idx['Debra Henley',:], idx[:, ['len', 'sum'], :]])
                       Quantity                               Price  \
                            len                                 sum   
Product                     CPU Maintenance Monitor Software    CPU   
Manager      Status                                                   
Debra Henley won              1           0       0        0  65000   
             pending          1           2       0        0  40000   
             presented        1           0       0        2  30000   
             declined         2           0       0        0  70000   



Product                Maintenance Monitor Software  
Manager      Status                                  
Debra Henley won                 0       0        0  
             pending         10000       0        0  
             presented           0       0    20000  
             declined            0       0        0     

您的标准答案由@ScottBoston提供
除了@jezrael的indexlice
方法之外，我还将在广度和视角上添加这一点。

您还可以使用获取横截面
table.xs(['Debra Henley', 'won'])

                Product    
Quantity  len   CPU                1
                Maintenance        0
                Monitor            0
                Software           0
Price     mean  CPU            65000
                Maintenance        0
                Monitor            0
                Software           0
          sum   CPU            65000
                Maintenance        0
                Monitor            0
                Software           0
Name: (Debra Henley, won), dtype: int64

为了给这个答案增添更多色彩，pandas
使用元组引用pd的元素。在这种情况下，多索引.loc
是更快的方法，因为您的数据格式已经正确。您绝对正确，多级索引必须使用元组（或者至少肯定不使用列表）来完成；这是因为列表（或numpy数组）用于选择同一“级别”上的数据（如果您认为使用元组（多索引）进行索引会更深，那么列表会更宽）；例如，看看你从下面得到了什么：table.loc['Debra Henley'，'Fred Anderson']
和table.loc[（'Debra Henley'，'won'），（'Fred Anderson'，'pending'）]
@KenWei“你的数据已经以正确的形式存在”，你所说的正确形式是什么意思？对不起，我不清楚，最好用一个例子来解释：假设您想要获得每个经理的pending
行。您可以使用table.xs（'pending'，level='Status'）
，而使用.loc
您需要交换Status
和Manager
级别，然后执行table.loc['pending']
（因此我会将此案例描述为数据格式“错误”）。（基本上，如果您不需要提供level
参数，那么使用.loc
可能会得到相同的结果）此外，在输出中保留索引级别也有一些微妙之处，这取决于是否使用元组、列表中的元组或字符串执行.loc
，以及.xs（）