Python 一个索引列到多个索引列_Python_Pandas_Multi Index

Python 一个索引列到多个索引列

python pandas

Python 一个索引列到多个索引列,python,pandas,multi-index,Python,Pandas,Multi Index,我有一个数据框，其标题如下所示： Time Peter_Price, Peter_variable 1, Peter_variable 2, Maria_Price, Maria_variable 1, Maria_variable 3,John_price,... 2017 12 985685466 Street 1 12 4984984984 Street 2 2018 10

我有一个数据框，其标题如下所示：

Time Peter_Price, Peter_variable 1, Peter_variable 2, Maria_Price, Maria_variable 1, Maria_variable 3,John_price,...
2017 12           985685466           Street 1       12           4984984984          Street 2       
2018 10           985785466           Street 3       78           4984974184          Street 8 
2019 12           985685466           Street 1       12           4984984984          Street 2 
2020 12           985685466           Street 1       12           4984984984          Street 2 
2021 12           985685466           Street 1       12           4984984984          Street 2

什么是最好的多指标来比较以后各组的变量，例如，哪个人的变量3最高，或者所有变量3的趋势由人决定

我想我需要的是这样的东西，但我接受其他建议（这是我第一次使用多索引）

尝试：

输出：

     Peter                      Maria                      
     Price  variable1 variable2 Price   variable1 variable3
Time                                                       
2017    12  985685466  Street 1    12  4984984984  Street 2
2018    10  985785466  Street 3    78  4984974184  Street 8
2019    12  985685466  Street 1    12  4984984984  Street 2
2020    12  985685466  Street 1    12  4984984984  Street 2
2021    12  985685466  Street 1    12  4984984984  Street 2

尝试：

输出：

     Peter                      Maria                      
     Price  variable1 variable2 Price   variable1 variable3
Time                                                       
2017    12  985685466  Street 1    12  4984984984  Street 2
2018    10  985785466  Street 3    78  4984974184  Street 8
2019    12  985685466  Street 1    12  4984984984  Street 2
2020    12  985685466  Street 1    12  4984984984  Street 2
2021    12  985685466  Street 1    12  4984984984  Street 2

您可以尝试以下方法：

创建数据要尝试的代码片段然后，您可以使用

xs

方法为单个级别的分析选择特定变量。仅将子集设置为“变量2”

示例分析：每年谁的“价格”更高

您可以尝试以下方法：

创建数据要尝试的代码片段然后，您可以使用

xs

方法为单个级别的分析选择特定变量。仅将子集设置为“变量2”

示例分析：每年谁的“价格”更高

第一行有效，但第二行代码报告：“DataFrame”对象没有属性“MultiIndex”@jorgelbertopalacios

pd.MultiIndex

，而不是

df.MultiIndex

…也有效。谢谢广黄，这是你第二次帮助我。第一行成功了，但第二行代码报告说：“DataFrame”对象没有属性“MultiIndex”@jorgelbertopalacios

pd.MultiIndex

，而不是

df.MultiIndex

…同样有效。谢谢光环，这是你第二次帮助我。太好了。工作！谢谢你。工作！谢谢

     Peter                      Maria                      
     Price  variable1 variable2 Price   variable1 variable3
Time                                                       
2017    12  985685466  Street 1    12  4984984984  Street 2
2018    10  985785466  Street 3    78  4984974184  Street 8
2019    12  985685466  Street 1    12  4984984984  Street 2
2020    12  985685466  Street 1    12  4984984984  Street 2
2021    12  985685466  Street 1    12  4984984984  Street 2

import pandas as pd
import numpy as np

import itertools

people = ["Peter", "Maria"]
vars = ["Price", "variable 1", "variable 2"]
columns = ["_".join(x) for x in itertools.product(people, vars)]

df = (pd.DataFrame(np.random.rand(10, 6), columns=columns)
        .assign(time=np.arange(2012, 2022))

print(df.head())
   Peter_Price  Peter_variable 1  Peter_variable 2  Maria_Price  Maria_variable 1  Maria_variable 2  time
0     0.542336          0.201243          0.616050     0.313119          0.652847          0.928497  2012
1     0.587392          0.143169          0.594997     0.553803          0.249188          0.076633  2013
2     0.447318          0.410310          0.443391     0.947064          0.476262          0.230092  2014
3     0.285560          0.018005          0.869387     0.165836          0.399670          0.307120  2015
4     0.422084          0.414453          0.626180     0.658528          0.286265          0.404369  2016

new_df = df.set_index("time")
new_df.columns = new_df.columns.str.split("_", expand=True)

print(new_df.head())
         Peter                           Maria
         Price variable 1 variable 2     Price variable 1 variable 2
time
2012  0.542336   0.201243   0.616050  0.313119   0.652847   0.928497
2013  0.587392   0.143169   0.594997  0.553803   0.249188   0.076633
2014  0.447318   0.410310   0.443391  0.947064   0.476262   0.230092
2015  0.285560   0.018005   0.869387  0.165836   0.399670   0.307120
2016  0.422084   0.414453   0.626180  0.658528   0.286265   0.404369

>>> new_df.xs("variable 2", level=1, axis=1)

         Peter     Maria
time
2012  0.616050  0.928497
2013  0.594997  0.076633
2014  0.443391  0.230092
2015  0.869387  0.307120
2016  0.626180  0.404369
2017  0.443827  0.544415
2018  0.425426  0.176707
2019  0.454269  0.414625
2020  0.863477  0.322609
2021  0.902759  0.821789

>>> new_df.xs("Price", level=1, axis=1).idxmax(axis=1)

time
2012    Peter
2013    Peter
2014    Maria
2015    Peter
2016    Maria
2017    Peter
2018    Maria
2019    Peter
2020    Maria
2021    Peter
dtype: object