Python 在可能是或可能不是多索引的数据帧上操作_Python_Pandas_Dataframe_Multi Index

Python 在可能是或可能不是多索引的数据帧上操作

python pandas dataframe

Python 在可能是或可能不是多索引的数据帧上操作,python,pandas,dataframe,multi-index,Python,Pandas,Dataframe,Multi Index,我有几个函数可以在dataframe中生成新列，作为dataframe中现有列的函数。我这里有两种不同的场景：（1）数据帧不是多索引的，有一组列，比如[a，b]；（2）数据帧是多索引的，现在有相同的列标题集重复N次，比如[（a，1），（b，1），（a，2），（b，2）…（a，N），（N，N）] 我一直在以如下所示的方式执行上述功能： def f(df): if multiindex(df): for s df[a].columns: df[c,s]

我有几个函数可以在dataframe中生成新列，作为dataframe中现有列的函数。我这里有两种不同的场景：（1）数据帧不是多索引的，有一组列，比如[a，b]；（2）数据帧是多索引的，现在有相同的列标题集重复N次，比如[（a，1），（b，1），（a，2），（b，2）…（a，N），（N，N）]

我一直在以如下所示的方式执行上述功能：

def f(df):
    if multiindex(df):
        for s df[a].columns:
            df[c,s] = someFunction(df[a,s], df[b,s])
    else:
        df[c] = someFunction(df[a], df[b])

有没有其他方法可以做到这一点，而不必到处使用这些if-multi-index/else语句并复制someFunction代码？我不希望将多索引帧拆分为N个较小的数据帧（我通常需要过滤数据或做一些事情，并在所有1,2，…N个帧中保持行的一致性，将它们放在一个帧中似乎是最好的方法）

您可能仍然需要测试列是否是多索引，但这应该更干净、更高效。注意，如果您的函数使用列的摘要统计信息，这将不起作用。例如，如果

someFunction

除以列“a”的平均值

解决方案安装程序

df

看起来像这样

          a                             b                    
        one       two     three       one       two     three
0  0.282834  0.490313  0.201300  0.140157  0.467710  0.352555
1  0.838527  0.707131  0.763369  0.265170  0.452397  0.968125
2  0.822786  0.785226  0.434637  0.146397  0.056220  0.003197
3  0.314795  0.414096  0.230474  0.595133  0.060608  0.900934
4  0.334733  0.118689  0.054299  0.237786  0.658538  0.057256
5  0.993753  0.552942  0.665615  0.336948  0.788817  0.320329
6  0.310809  0.199921  0.158675  0.059406  0.801491  0.134779
7  0.971043  0.183953  0.723950  0.909778  0.103679  0.695661
8  0.755384  0.728327  0.029720  0.408389  0.808295  0.677195
9  0.276158  0.978232  0.623972  0.897015  0.253178  0.093772

                a         b
0 one    0.282834  0.140157
  three  0.201300  0.352555
  two    0.490313  0.467710
1 one    0.838527  0.265170
  three  0.763369  0.968125
  two    0.707131  0.452397
2 one    0.822786  0.146397
  three  0.434637  0.003197
  two    0.785226  0.056220
3 one    0.314795  0.595133
  three  0.230474  0.900934
  two    0.414096  0.060608
4 one    0.334733  0.237786
  three  0.054299  0.057256
  two    0.118689  0.658538
5 one    0.993753  0.336948
  three  0.665615  0.320329
  two    0.552942  0.788817
6 one    0.310809  0.059406
  three  0.158675  0.134779
  two    0.199921  0.801491
7 one    0.971043  0.909778
  three  0.723950  0.695661
  two    0.183953  0.103679
8 one    0.755384  0.408389
  three  0.029720  0.677195
  two    0.728327  0.808295
9 one    0.276158  0.897015
  three  0.623972  0.093772
  two    0.978232  0.253178

我构造了

df

以具有

多索引

列。我要做的是使用

.stack（）

方法将列索引的第二级推送到行索引的第二级

df.stack（）

如下所示

          a                             b                    
        one       two     three       one       two     three
0  0.282834  0.490313  0.201300  0.140157  0.467710  0.352555
1  0.838527  0.707131  0.763369  0.265170  0.452397  0.968125
2  0.822786  0.785226  0.434637  0.146397  0.056220  0.003197
3  0.314795  0.414096  0.230474  0.595133  0.060608  0.900934
4  0.334733  0.118689  0.054299  0.237786  0.658538  0.057256
5  0.993753  0.552942  0.665615  0.336948  0.788817  0.320329
6  0.310809  0.199921  0.158675  0.059406  0.801491  0.134779
7  0.971043  0.183953  0.723950  0.909778  0.103679  0.695661
8  0.755384  0.728327  0.029720  0.408389  0.808295  0.677195
9  0.276158  0.978232  0.623972  0.897015  0.253178  0.093772

                a         b
0 one    0.282834  0.140157
  three  0.201300  0.352555
  two    0.490313  0.467710
1 one    0.838527  0.265170
  three  0.763369  0.968125
  two    0.707131  0.452397
2 one    0.822786  0.146397
  three  0.434637  0.003197
  two    0.785226  0.056220
3 one    0.314795  0.595133
  three  0.230474  0.900934
  two    0.414096  0.060608
4 one    0.334733  0.237786
  three  0.054299  0.057256
  two    0.118689  0.658538
5 one    0.993753  0.336948
  three  0.665615  0.320329
  two    0.552942  0.788817
6 one    0.310809  0.059406
  three  0.158675  0.134779
  two    0.199921  0.801491
7 one    0.971043  0.909778
  three  0.723950  0.695661
  two    0.183953  0.103679
8 one    0.755384  0.408389
  three  0.029720  0.677195
  two    0.728327  0.808295
9 one    0.276158  0.897015
  three  0.623972  0.093772
  two    0.978232  0.253178

现在，您可以对df.stack（）进行操作，就好像列不是一个

MultiIndex

示范我会给你你想要的

          a                             b                             c  \
        one     three       two       one     three       two       one   
0  0.282834  0.201300  0.490313  0.140157  0.352555  0.467710  0.565667   
1  0.838527  0.763369  0.707131  0.265170  0.968125  0.452397  1.677055   
2  0.822786  0.434637  0.785226  0.146397  0.003197  0.056220  1.645572   
3  0.314795  0.230474  0.414096  0.595133  0.900934  0.060608  0.629591   
4  0.334733  0.054299  0.118689  0.237786  0.057256  0.658538  0.669465   
5  0.993753  0.665615  0.552942  0.336948  0.320329  0.788817  1.987507   
6  0.310809  0.158675  0.199921  0.059406  0.134779  0.801491  0.621618   
7  0.971043  0.723950  0.183953  0.909778  0.695661  0.103679  1.942086   
8  0.755384  0.029720  0.728327  0.408389  0.677195  0.808295  1.510767   
9  0.276158  0.623972  0.978232  0.897015  0.093772  0.253178  0.552317   


      three       two  
0  0.402600  0.980626  
1  1.526739  1.414262  
2  0.869273  1.570453  
3  0.460948  0.828193  
4  0.108599  0.237377  
5  1.331230  1.105884  
6  0.317349  0.399843  
7  1.447900  0.367907  
8  0.059439  1.456654  
9  1.247944  1.956464

谢谢，堆栈方法在这里非常有用。问：如果数据帧很大，并且重复调用了一系列这些方法，这可以吗？例如，“堆栈”和“取消堆栈”实际上是在每次分配新的数据帧并填充所有新内存，还是只是更改内存中相同值的视图？

          a                             b                             c  \
        one     three       two       one     three       two       one   
0  0.282834  0.201300  0.490313  0.140157  0.352555  0.467710  0.565667   
1  0.838527  0.763369  0.707131  0.265170  0.968125  0.452397  1.677055   
2  0.822786  0.434637  0.785226  0.146397  0.003197  0.056220  1.645572   
3  0.314795  0.230474  0.414096  0.595133  0.900934  0.060608  0.629591   
4  0.334733  0.054299  0.118689  0.237786  0.057256  0.658538  0.669465   
5  0.993753  0.665615  0.552942  0.336948  0.320329  0.788817  1.987507   
6  0.310809  0.158675  0.199921  0.059406  0.134779  0.801491  0.621618   
7  0.971043  0.723950  0.183953  0.909778  0.695661  0.103679  1.942086   
8  0.755384  0.029720  0.728327  0.408389  0.677195  0.808295  1.510767   
9  0.276158  0.623972  0.978232  0.897015  0.093772  0.253178  0.552317   


      three       two  
0  0.402600  0.980626  
1  1.526739  1.414262  
2  0.869273  1.570453  
3  0.460948  0.828193  
4  0.108599  0.237377  
5  1.331230  1.105884  
6  0.317349  0.399843  
7  1.447900  0.367907  
8  0.059439  1.456654  
9  1.247944  1.956464