Python 如何访问多索引熊猫数据帧中的先前行
如何在Datetime索引的多级数据帧内访问,例如:这是下载的Fin数据。 最困难的部分是进入框架并访问特定内部级别的非相邻行,而不明确指定外部级别日期,因为我有数千行这样的行Python 如何访问多索引熊猫数据帧中的先前行,python,pandas,indexing,dataframe,multi-index,Python,Pandas,Indexing,Dataframe,Multi Index,如何在Datetime索引的多级数据帧内访问,例如:这是下载的Fin数据。 最困难的部分是进入框架并访问特定内部级别的非相邻行,而不明确指定外部级别日期,因为我有数千行这样的行 ABC DEF GHI \ Date STATS 2012-07-19 00:00:00
ABC DEF GHI \
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13
price 5 8 1
quantity 12 9 8
因此,我正在搜索的两个公式可以总结为
X(today row) = quantity(prior row)*price(prior row)
or
X(today row) = quantity(prior row)*price(today)
困难在于如何使用numpy或panda为多级索引制定对这些行的访问,并且这些行不是相邻的
最后,我会得出以下结论:
ABC DEF GHI XN
Date STATS
2012-07-19 00:00:00 NaN NaN NaN
investment 4 9 13 X1
price 5 8 1
quantity 12 9 8
2012-07-18 00:00:00 NaN NaN NaN
investment 1 2 3 X2
price 2 3 4
quantity 18 6 7
X1= (18*2)+(6*3)+(7*4) (quantity_day_2 *price_day_2 data)
or for the other formula
X1= (18*5)+(6*8)+(7*1) (quantity_day_2 *price_day_1 data)
我可以使用groupby吗?您可以使用:
#add new datetime with data for better testing
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
如果需要将输出添加到原始的
数据帧
,则更复杂:
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
您能否使用小整数值修改数据帧(以便于验证),并从数据样本中添加所需的输出?谢谢。哦,是的,好的。请告诉我什么时候问题会被评论改变。谢谢。@jezrael很好。有一个问题-期望的输出是什么?2个新的
DataFrame
s?感谢您的辛勤工作!!我会仔细看一下,然后再评论。。我想我可以把col2复制到我原来的pd.Series(XN)上,对吧?因为我真的需要保持内心的水平……)我询问所需的输出-您需要新的列来DataFrame
?如果是,这个新列的索引是什么<代码>日期时间和投资
?ohhh。是的,我需要在数据框中添加一个新列
新列的索引将具有投资
上的值。谢谢!:)我创建了另一个解决方案,因为它完全不同。我的个人资料中有电子邮件;)但我不知道我是否有时间,但你可以给我发电子邮件。请检查我的解决方案-输出为2数据帧-为1.st和2。条件。您是否使用(pd.concat([df,b],axis=1))。来_csv()
?
print (p * q)
ABC DEF GHI
Date
2012-07-17 0.0 1.0 0.0
2012-07-18 36.0 18.0 28.0
2012-07-19 60.0 72.0 8.0
print ((p * q).sum(axis=1).to_frame().rename(columns={0:'col1'}))
col1
Date
2012-07-17 1.0
2012-07-18 82.0
2012-07-19 140.0
#shift row with -1, because lexsorted df
print (p.shift(-1, freq='D') * q)
ABC DEF GHI
Date
2012-07-16 NaN NaN NaN
2012-07-17 10.0 3.0 0.0
2012-07-18 90.0 48.0 7.0
2012-07-19 NaN NaN NaN
print ((p.shift(-1, freq='D') * q).sum(axis=1).to_frame().rename(columns={0:'col2'}))
col2
Date
2012-07-16 0.0
2012-07-17 13.0
2012-07-18 145.0
2012-07-19 0.0
print (df)
ABC DEF GHI
Date STATS
2012-07-19 NaN NaN NaN
investment 4.0 9.0 13.0
price 5.0 8.0 1.0
quantity 12.0 9.0 8.0
2012-07-18 NaN NaN NaN
investment 1.0 2.0 3.0
price 2.0 3.0 4.0
quantity 18.0 6.0 7.0
2012-07-17 NaN NaN NaN
investment 1.0 2.0 3.0
price 0.0 1.0 4.0
quantity 5.0 1.0 0.0
df.sort_index(inplace=True)
#rename value in level to investment - align data in final concat
idx = pd.IndexSlice
p = df.loc[idx[:,'price'],:].rename(index={'price':'investment'})
q = df.loc[idx[:,'quantity'],:].rename(index={'quantity':'investment'})
print (p)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 4.0
2012-07-18 investment 2.0 3.0 4.0
2012-07-19 investment 5.0 8.0 1.0
print (q)
ABC DEF GHI
Date STATS
2012-07-17 investment 5.0 1.0 0.0
2012-07-18 investment 18.0 6.0 7.0
2012-07-19 investment 12.0 9.0 8.0
#multiple and concat to original df
print (p * q)
ABC DEF GHI
Date STATS
2012-07-17 investment 0.0 1.0 0.0
2012-07-18 investment 36.0 18.0 28.0
2012-07-19 investment 60.0 72.0 8.0
a = (p * q).sum(axis=1).rename('col1')
print (pd.concat([df, a], axis=1))
ABC DEF GHI col1
Date STATS
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 1.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 82.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 140.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN
#shift with Multiindex - not supported yet - first create Datatimeindex with unstack
#, then shift and last reshape to original by stack
#multiple and concat to original df
print (p.unstack().shift(-1, freq='D').stack() * q)
ABC DEF GHI
Date STATS
2012-07-16 investment NaN NaN NaN
2012-07-17 investment 10.0 3.0 0.0
2012-07-18 investment 90.0 48.0 7.0
2012-07-19 investment NaN NaN NaN
b = (p.unstack().shift(-1, freq='D').stack() * q).sum(axis=1).rename('col2')
print (pd.concat([df, b], axis=1))
ABC DEF GHI col2
Date STATS
2012-07-16 investment NaN NaN NaN 0.0
2012-07-17 NaN NaN NaN NaN
investment 1.0 2.0 3.0 13.0
price 0.0 1.0 4.0 NaN
quantity 5.0 1.0 0.0 NaN
2012-07-18 NaN NaN NaN NaN
investment 1.0 2.0 3.0 145.0
price 2.0 3.0 4.0 NaN
quantity 18.0 6.0 7.0 NaN
2012-07-19 NaN NaN NaN NaN
investment 4.0 9.0 13.0 0.0
price 5.0 8.0 1.0 NaN
quantity 12.0 9.0 8.0 NaN