Python Pandas-如何将多索引数据框中的列缩放到每个级别=0组中的顶行
我有一个多索引数据帧Python Pandas-如何将多索引数据框中的列缩放到每个级别=0组中的顶行,python,pandas,dataframe,apply,multi-index,Python,Pandas,Dataframe,Apply,Multi Index,我有一个多索引数据帧dfu: open high low close Date Time 2016-11-28 09:43:00 26.03 26.03 26.030 26.030 09:48:00 25.90 25.90 25.760 25.760 09:51:00 26.00 26.00 25.985 25.985 2016-11-29 09:30:00
dfu
:
open high low close
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.030
09:48:00 25.90 25.90 25.760 25.760
09:51:00 26.00 26.00 25.985 25.985
2016-11-29 09:30:00 24.98 24.98 24.98 24.9800
09:33:00 25.00 25.00 24.99 24.9900
09:35:00 25.33 25.46 25.33 25.4147
我想创建一个新列['closeScaled'],它是通过使用['open']列中当前级别=0的第一行和当前行['close']作为参数执行函数foo来计算的。我怀疑解决方案将涉及如下内容:
dfu['closeScaled']=dfu.apply(lambda x: foo(*get first row of current date*[0],x[3]))
我似乎无法理解当前级别的get第一行=0部分
如果foo
是:
def foo(firstOpen,currentClose):
return (currentClose / firstOpen)
然后我希望closeScaled
列包含(截断为4位小数):
您可以按创建的系列
除以with和last:
如果需要将浮点值截断为4位小数:
open high low close closeScaled
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.030 1.0000
09:48:00 25.90 25.90 25.760 25.760 0.9896
09:51:00 26.00 26.00 25.985 25.985 0.9982
2016-11-29 09:30:00 24.98 24.98 24.98 24.9800 1.0000
09:33:00 25.00 25.00 24.99 24.9900 1.0004
09:35:00 25.33 25.46 25.33 25.4147 1.0174
首先乘以10000
,转换为int
,然后除以10000
dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
.mul(10000).astype(int).div(10000)
print (dfu)
open high low close closeScaled
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.0300 1.0000
09:48:00 25.90 25.90 25.760 25.7600 0.9896
09:51:00 26.00 26.00 25.985 25.9850 0.9982
2016-11-29 09:30:00 24.98 24.98 24.980 24.9800 1.0000
09:33:00 25.00 25.00 24.990 24.9900 1.0004
09:35:00 25.33 25.46 25.330 25.4147 1.0174
使用
groupby
+apply
+lambda
df.groupby(level=0).apply(
lambda df: df.assign(closeScaled=df.close.div(df.open.iloc[0]).round(4))
)
open high low close closeScaled
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.0300 1.0000
09:48:00 25.90 25.90 25.760 25.7600 0.9896
09:51:00 26.00 26.00 25.985 25.9850 0.9983
2016-11-29 09:30:00 24.98 24.98 24.980 24.9800 1.0000
09:33:00 25.00 25.00 24.990 24.9900 1.0004
09:35:00 25.33 25.46 25.330 25.4147 1.0174
#http://stackoverflow.com/a/783927/2901002
def truncate(f, n):
'''Truncates/pads a float f to n decimal places without rounding'''
s = '{}'.format(f)
if 'e' in s or 'E' in s:
return '{0:.{1}f}'.format(f, n)
i, p, d = s.partition('.')
return '.'.join([i, (d+'0'*n)[:n]])
dfu['closeScaled'] = dfu.close.div(dfu.groupby(level=0)['open'].transform('first'))
.apply(lambda x: truncate(x,4)).astype(float)
print (dfu)
open high low close closeScaled
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.0300 1.0000
09:48:00 25.90 25.90 25.760 25.7600 0.9896
09:51:00 26.00 26.00 25.985 25.9850 0.9982
2016-11-29 09:30:00 24.98 24.98 24.980 24.9800 1.0000
09:33:00 25.00 25.00 24.990 24.9900 1.0004
09:35:00 25.33 25.46 25.330 25.4147 1.0174
df.groupby(level=0).apply(
lambda df: df.assign(closeScaled=df.close.div(df.open.iloc[0]).round(4))
)
open high low close closeScaled
Date Time
2016-11-28 09:43:00 26.03 26.03 26.030 26.0300 1.0000
09:48:00 25.90 25.90 25.760 25.7600 0.9896
09:51:00 26.00 26.00 25.985 25.9850 0.9983
2016-11-29 09:30:00 24.98 24.98 24.980 24.9800 1.0000
09:33:00 25.00 25.00 24.990 24.9900 1.0004
09:35:00 25.33 25.46 25.330 25.4147 1.0174