Python 基于多索引列数据帧中的列范围进行切片

Python 基于多索引列数据帧中的列范围进行切片,python,pandas,dataframe,slice,multi-index,Python,Pandas,Dataframe,Slice,Multi Index,我正在通过执行以下操作创建数据帧: months = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ] monthyAmounts = [ "actual", "budgeted", "difference" ] income = [] names = [] for x in range( incomeIndex + 1, expensesIndex )

我正在通过执行以下操作创建数据帧:

months        = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]

income = []
names  = []

for x in range( incomeIndex + 1, expensesIndex ):
    amounts = [ randint( -1000, 15000 ) for x in range( 0, len( months ) * len( monthyAmounts ) ) ]
    income.append( amounts )
    names.append( f"name_{x}" )

index    = pd.Index( names, name = 'category' )
columns  = pd.MultiIndex.from_product( [ months, monthyAmounts ], names = [ 'month', 'type' ] )
incomeDF = pd.DataFrame( income, index = index, columns = columns )
数据帧看起来像: (3月至12月删除)

我想要的是对每一行,对一月到五月的差异列进行切片。我能做的是,通过执行以下操作,为所有月份的差异列进行切片:

incomeDifferenceDF = incomeDF.loc[ :, idx[ :, 'difference' ] ]
这给了我一个数据框,看起来像: (3月至12月删除)

我尝试的是:

incomeDifferenceDF = incomeDF.loc[ :, idx[ 'Jan' : 'May', 'difference' ] ]
但这给了我一个错误:

UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'
SyntaxError: invalid syntax
( Points at ['Jan':'May'] )
因此,这似乎很接近,但我不确定如何解决问题

我也尝试过:

incomeDifferenceDF = incomeDF.loc[ :, idx[ ['Jan':'May'], 'difference' ] ]
但这只会产生错误:

UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'
SyntaxError: invalid syntax
( Points at ['Jan':'May'] )

执行此操作的最佳方法是什么?

如果需要通过
多索引选择,则需要布尔掩码:

index    = pd.Index( [1,2,3,4], name = 'category' )
budgetMonths = pd.date_range( "January, 2018", periods = 12, freq = 'BM' ) 
months        = [ 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                  'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec' ]
monthyAmounts = [ "actual", "budgeted", "difference" ]
columns = pd.MultiIndex.from_product( [ months, monthyAmounts ], names = [ 'month', 'type' ])
incomeDF = pd.DataFrame( 10, index = index, columns = columns )

#trick for get values between 
idx = pd.Series(0,index=months).loc['Jan' : 'May'].index
print (idx)
Index(['Jan', 'Feb', 'Mar', 'Apr', 'May'], dtype='object')

mask1 = incomeDF.columns.get_level_values(0).isin(idx)
mask2 = incomeDF.columns.get_level_values(1) == 'difference'

incomeDifferenceDF = incomeDF.loc[:, mask1 & mask2]
print (incomeDifferenceDF)
month           Jan        Feb        Mar        Apr        May
type     difference difference difference difference difference
category                                                       
1                10         10         10         10         10
2                10         10         10         10         10
3                10         10         10         10         10
4                10         10         10         10         10

嗯,
1-01-01
不是有效日期,但需要对此进行测试。给我一些时间。花你所需要的所有时间…谢谢你…但是,也许使用特定日期的时间序列和切片已经足够好了…但是,能够基于月份名称进行切片(例如“Jan”:“May”)@ericg-如果值像在新示例中那样排序,那么您的解决方案工作得很好。@ericg-hmmm,如果想要使用月份的名称,选择是个问题,因为无法排序。所以可能的解决方案应该是将月份名称改为数字。@ericg-添加了一个可能的解决方案,但很复杂-需要布尔掩码。