Python 如何访问多索引熊猫数据帧中的先前行_Python_Pandas_Indexing_Dataframe_Multi Index

Python 如何访问多索引熊猫数据帧中的先前行

python pandas indexing dataframe

Python 如何访问多索引熊猫数据帧中的先前行,python,pandas,indexing,dataframe,multi-index,Python,Pandas,Indexing,Dataframe,Multi Index,如何在Datetime索引的多级数据帧内访问，例如：这是下载的Fin数据。最困难的部分是进入框架并访问特定内部级别的非相邻行，而不明确指定外部级别日期，因为我有数千行这样的行 ABC DEF GHI \ Date STATS 2012-07-19 00:00:00

如何在Datetime索引的多级数据帧内访问，例如：这是下载的Fin数据。最困难的部分是进入框架并访问特定内部级别的非相邻行，而不明确指定外部级别日期，因为我有数千行这样的行

                                       ABC        DEF        GHI  \  
Date                STATS                                            
2012-07-19 00:00:00                    NaN         NaN         NaN   
                    investment        4             9          13        
                    price             5             8          1  
                    quantity          12            9          8

因此，我正在搜索的两个公式可以总结为

X(today row) = quantity(prior row)*price(prior row) 
or                           
X(today row) = quantity(prior row)*price(today)

困难在于如何使用numpy或panda为多级索引制定对这些行的访问，并且这些行不是相邻的

最后，我会得出以下结论：

                                         ABC        DEF        GHI    XN
Date                STATS                                            
2012-07-19 00:00:00                    NaN         NaN         NaN   
                    investment          4            9          13    X1
                    price               5            8           1   
                    quantity            12           9           8    

2012-07-18 00:00:00                    NaN         NaN         NaN   
                    investment          1             2          3    X2
                    price               2             3          4   
                    quantity           18             6          7    

X1= (18*2)+(6*3)+(7*4) (quantity_day_2 *price_day_2 data) 
or for the other formula
X1= (18*5)+(6*8)+(7*1) (quantity_day_2 *price_day_1 data)

我可以使用groupby吗？

您可以使用：

#add new datetime with data for better testing
print (df)
                        ABC  DEF   GHI
Date       STATS                      
2012-07-19              NaN  NaN   NaN
           investment   4.0  9.0  13.0
           price        5.0  8.0   1.0
           quantity    12.0  9.0   8.0
2012-07-18              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        2.0  3.0   4.0
           quantity    18.0  6.0   7.0
2012-07-17              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        0.0  1.0   4.0
           quantity     5.0  1.0   0.0

如果需要将输出添加到原始的

数据帧

，则更复杂：

print (df)
                        ABC  DEF   GHI
Date       STATS                      
2012-07-19              NaN  NaN   NaN
           investment   4.0  9.0  13.0
           price        5.0  8.0   1.0
           quantity    12.0  9.0   8.0
2012-07-18              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        2.0  3.0   4.0
           quantity    18.0  6.0   7.0
2012-07-17              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        0.0  1.0   4.0
           quantity     5.0  1.0   0.0

您能否使用小整数值修改数据帧（以便于验证），并从数据样本中添加所需的输出？谢谢。哦，是的，好的。请告诉我什么时候问题会被评论改变。谢谢。@jezrael很好。有一个问题-期望的输出是什么？2个新的

DataFrame

s？感谢您的辛勤工作！！我会仔细看一下，然后再评论。。我想我可以把col2复制到我原来的pd.Series（XN）上，对吧？因为我真的需要保持内心的水平……）我询问所需的输出-您需要新的列来

DataFrame

？如果是，这个新列的索引是什么<代码>日期时间和

投资

？ohhh。是的，我需要在

数据框中添加一个新列

新列的索引将具有

投资

上的值。谢谢！：）我创建了另一个解决方案，因为它完全不同。我的个人资料中有电子邮件；）但我不知道我是否有时间，但你可以给我发电子邮件。请检查我的解决方案-输出为2数据帧-为1.st和2。条件。您是否使用

（pd.concat（[df，b]，axis=1））。来_csv（）

？

print (p * q)
             ABC   DEF   GHI
Date                        
2012-07-17   0.0   1.0   0.0
2012-07-18  36.0  18.0  28.0
2012-07-19  60.0  72.0   8.0

print ((p * q).sum(axis=1).to_frame().rename(columns={0:'col1'}))
             col1
Date             
2012-07-17    1.0
2012-07-18   82.0
2012-07-19  140.0

#shift row with -1, because lexsorted df
print (p.shift(-1, freq='D') * q)
             ABC   DEF  GHI
Date                       
2012-07-16   NaN   NaN  NaN
2012-07-17  10.0   3.0  0.0
2012-07-18  90.0  48.0  7.0
2012-07-19   NaN   NaN  NaN

print ((p.shift(-1, freq='D') * q).sum(axis=1).to_frame().rename(columns={0:'col2'}))
             col2
Date             
2012-07-16    0.0
2012-07-17   13.0
2012-07-18  145.0
2012-07-19    0.0

print (df)
                        ABC  DEF   GHI
Date       STATS                      
2012-07-19              NaN  NaN   NaN
           investment   4.0  9.0  13.0
           price        5.0  8.0   1.0
           quantity    12.0  9.0   8.0
2012-07-18              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        2.0  3.0   4.0
           quantity    18.0  6.0   7.0
2012-07-17              NaN  NaN   NaN
           investment   1.0  2.0   3.0
           price        0.0  1.0   4.0
           quantity     5.0  1.0   0.0

df.sort_index(inplace=True)

#rename value in level to investment - align data in final concat
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:].rename(index={'price':'investment'})
q = df.loc[idx[:,'quantity'],:].rename(index={'quantity':'investment'})
print (p)
                       ABC  DEF  GHI
Date       STATS                    
2012-07-17 investment  0.0  1.0  4.0
2012-07-18 investment  2.0  3.0  4.0
2012-07-19 investment  5.0  8.0  1.0

print (q)
                        ABC  DEF  GHI
Date       STATS                     
2012-07-17 investment   5.0  1.0  0.0
2012-07-18 investment  18.0  6.0  7.0
2012-07-19 investment  12.0  9.0  8.0

#multiple and concat to original df
print (p * q)
                        ABC   DEF   GHI
Date       STATS                       
2012-07-17 investment   0.0   1.0   0.0
2012-07-18 investment  36.0  18.0  28.0
2012-07-19 investment  60.0  72.0   8.0

a = (p * q).sum(axis=1).rename('col1')
print (pd.concat([df, a], axis=1))
                        ABC  DEF   GHI   col1
Date       STATS                             
2012-07-17              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0    1.0
           price        0.0  1.0   4.0    NaN
           quantity     5.0  1.0   0.0    NaN
2012-07-18              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0   82.0
           price        2.0  3.0   4.0    NaN
           quantity    18.0  6.0   7.0    NaN
2012-07-19              NaN  NaN   NaN    NaN
           investment   4.0  9.0  13.0  140.0
           price        5.0  8.0   1.0    NaN
           quantity    12.0  9.0   8.0    NaN

#shift with Multiindex - not supported yet - first create Datatimeindex with unstack
#, then shift and last reshape to original by stack

#multiple and concat to original df
print (p.unstack().shift(-1, freq='D').stack() * q)
                        ABC   DEF  GHI
Date       STATS                      
2012-07-16 investment   NaN   NaN  NaN
2012-07-17 investment  10.0   3.0  0.0
2012-07-18 investment  90.0  48.0  7.0
2012-07-19 investment   NaN   NaN  NaN

b = (p.unstack().shift(-1, freq='D').stack() * q).sum(axis=1).rename('col2')
print (pd.concat([df, b], axis=1))
                        ABC  DEF   GHI   col2
Date       STATS                             
2012-07-16 investment   NaN  NaN   NaN    0.0
2012-07-17              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0   13.0
           price        0.0  1.0   4.0    NaN
           quantity     5.0  1.0   0.0    NaN
2012-07-18              NaN  NaN   NaN    NaN
           investment   1.0  2.0   3.0  145.0
           price        2.0  3.0   4.0    NaN
           quantity    18.0  6.0   7.0    NaN
2012-07-19              NaN  NaN   NaN    NaN
           investment   4.0  9.0  13.0    0.0
           price        5.0  8.0   1.0    NaN
           quantity    12.0  9.0   8.0    NaN