Python 筛选所有第一级列上的多索引_Python_Pandas_Dataframe_Multi Index

Python 筛选所有第一级列上的多索引

python pandas dataframe

Python 筛选所有第一级列上的多索引,python,pandas,dataframe,multi-index,Python,Pandas,Dataframe,Multi Index,试图找到一种方法，根据仅为其中一个顶级列定义的筛选器，高效筛选两个顶级列下的所有条目。最好用下面的例子和期望的输出来解释示例数据帧 import pandas as pd import numpy as np info = ['price', 'year'] months = ['month0','month1','month2'] settlement_dates = ['2020-12-31', '2021-01-01'] Data = [[[2,4,5],[2020,2021,2022]

试图找到一种方法，根据仅为其中一个顶级列定义的筛选器，高效筛选两个顶级列下的所有条目。最好用下面的例子和期望的输出来解释

示例数据帧

import pandas as pd
import numpy as np
info = ['price', 'year']
months = ['month0','month1','month2']
settlement_dates = ['2020-12-31', '2021-01-01']
Data = [[[2,4,5],[2020,2021,2022]],[[1,4,2],[2021,2022,2023]]]
Data = np.array(Data).reshape(len(settlement_date),len(months) * len(info))
midx = pd.MultiIndex.from_product([assets, Asset_feature])
df = pd.DataFrame(Data, index=settlement_dates, columns=midx)
df

            price                 year              
           month0 month1 month2 month0 month1 month2
2020-12-31      2      4      5   2020   2021   2022
2021-01-01      1      4      2   2021   2022   2023

idx_cols = pd.IndexSlice

df_filter = df.loc[:, idx_cols['year', :]]==2021

df[df_filter]


            price                  year               
           month0 month1 month2  month0  month1 month2
2020-12-31    NaN    NaN    NaN     NaN  2021.0    NaN
2021-01-01    NaN    NaN    NaN  2021.0     NaN    NaN

为多索引数据帧创建筛选器

import pandas as pd import numpy as np info = ['price', 'year'] months = ['month0','month1','month2'] settlement_dates = ['2020-12-31', '2021-01-01'] Data = [[[2,4,5],[2020,2021,2022]],[[1,4,2],[2021,2022,2023]]] Data = np.array(Data).reshape(len(settlement_date),len(months) * len(info)) midx = pd.MultiIndex.from_product([assets, Asset_feature]) df = pd.DataFrame(Data, index=settlement_dates, columns=midx) df price year month0 month1 month2 month0 month1 month2 2020-12-31 2 4 5 2020 2021 2022 2021-01-01 1 4 2 2021 2022 2023

idx_cols = pd.IndexSlice df_filter = df.loc[:, idx_cols['year', :]]==2021 df[df_filter] price year month0 month1 month2 month0 month1 month2 2020-12-31 NaN NaN NaN NaN 2021.0 NaN 2021-01-01 NaN NaN NaN 2021.0 NaN NaN
所需输出：

price year month0 month1 month2 month0 month1 month2 2020-12-31 NaN 4 NaN NaN 2021.0 NaN 2021-01-01 1 NaN NaN 2021.0 NaN NaN

您可以通过以下方式重塑
DataFrame
以简化解决方案：
您的解决方案是可行的，但更为复杂-通过向后和向前填充缺少的值，可以重新填充缺少的值：

idx_cols = pd.IndexSlice df_filter = df.loc[:, idx_cols['year', :]]==2021 df_filter = df_filter.reindex(df.columns, axis=1).stack(dropna=False).bfill(axis=1).ffill(axis=1).unstack() print (df_filter) price year month0 month1 month2 month0 month1 month2 2020-12-31 False True False False True False 2021-01-01 True False False True False False print (df[df_filter]) price year month0 month1 month2 month0 month1 month2 2020-12-31 NaN 4.0 NaN NaN 2021.0 NaN 2021-01-01 1.0 NaN NaN 2021.0 NaN NaN