Python 为什么在使用df_grouped.loc[]对df中的一些列进行分组后，在切片时会出现错误？_Python

Python 为什么在使用df_grouped.loc[]对df中的一些列进行分组后，在切片时会出现错误？

python

Python 为什么在使用df_grouped.loc[]对df中的一些列进行分组后，在切片时会出现错误？,python,Python,我是SAS用户。在Python中处理一些数据操作 isc_summary_sales=isc.groupby(['country','sales_channel','item_type'],as_index=False).aggregate({'order_id':['count'],'units_sold':['sum'],'unit_cost':['mean'],'unit_price':['mean'],'total_cost':['sum']) 上面的代码工作得很好，但是在尝试切片时

我是SAS用户。在Python中处理一些数据操作

isc_summary_sales=isc.groupby(['country','sales_channel','item_type'],as_index=False).aggregate({'order_id':['count'],'units_sold':['sum'],'unit_cost':['mean'],'unit_price':['mean'],'total_cost':['sum'])

上面的代码工作得很好，但是在尝试切片时

isc_summary_sales.loc[:,'country':'total_cost']

我犯了一个错误

UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

但是，使用

isc\u summary\u sales.iloc[：，0:7]

它可以正常工作

我不明白这是什么意思。为什么会发生这种情况？

它抛出该错误的原因是，在聚合之后，您的列有两级索引

比如说

import pandas as pd
df = pd.DataFrame({"a":[1, 1, 1, 2, 3, 2], "b":[1, 1, 3, 1, 2, 4], "c":[1, 2, 3, 1, 2, 4], "d":[1, 2, 3, 1, 2, 4]})
df_summary = df.groupby(["a", "b"], as_index=False).aggregate({"c":["mean", "sum"], "d":['sum']})
print(df_summary)

   a  b    c       d
        mean sum sum
0  1  1  1.5   3   3
1  1  3  3.0   3   3
2  2  1  1.0   1   1
3  2  4  4.0   4   4
4  3  2  2.0   2   2

正如您现在看到的，您不再有简单的列“a”、“b”、“c”和“d”，而是有多级列。似乎方法“loc”要求我们的数据框按词汇排序，当我们聚合原始数据框时，我们创建了不再排序的新列。但是，我们可以使用以下方法再次对其进行排序：

df_summary = df_summary.sortlevel(0, axis=1)

# And now this works
print(df_summary.loc[:, "b" : "d"])
   b    c       d
     mean sum sum
0  1  1.5   3   3
1  3  3.0   3   3
2  1  1.0   1   1
3  4  4.0   4   4
4  2  2.0   2   2

您可能还希望将列减少一级。我可以这样做：

df_summary.columns = ['_'.join(col[0] if col[1] == '' else col) for col in df_summary.columns]

# Which makes my DataFrame look like this
print(df_summary)
   a  b  c_mean  c_sum  d_sum
0  1  1     1.5      3      3
1  1  3     3.0      3      3
2  2  1     1.0      1      1
3  2  4     4.0      4      4
4  3  2     2.0      2      2

有关多级索引的更多信息可在此处找到：

您是想只选择“国家”和“总成本”列，还是选择它们之间的所有列？它们之间是我想要的。与iloc合作，而不是与Local合作，因此此问题可能会解释得更多。非常感谢。详细且有用。狙击手让它非常清晰。干杯：）