Python 计算特定行的几何平均回报率
我有一个这样的数据帧Python 计算特定行的几何平均回报率,python,pandas,Python,Pandas,我有一个这样的数据帧 Date price mid std top btm .............. 1999-07-21 8.6912 8.504580 0.084923 9.674425 8.334735 1999-07-22 8.6978 8.508515 0.092034 8.692583 8.324447 1999-07-
Date price mid std top btm
..............
1999-07-21 8.6912 8.504580 0.084923 9.674425 8.334735
1999-07-22 8.6978 8.508515 0.092034 8.692583 8.324447
1999-07-23 8.8127 8.524605 0.118186 10.760976 8.288234
1999-07-24 8.8779 8.688810 0.091124 8.871057 8.506563
..............
我想创建一个名为“diff”的新列。
如果在一行中,'price'>'top',那么我想用这行价格的几何平均回报率和前一行n-5的价格来填充这行的'diff'。5天几何平均数
例如,在第1999-07-22行中,价格大于top,因此我想在这一行中用07-22和07-17的几何平均值填充“diff”,注意日期可能不是连续的,因为不包括节假日。只有一小部分行可以满足需求。因此,“diff”中的大多数值都将丢失
您能告诉我如何在python中执行此操作吗?与for set NaNs一起使用:
编辑:
我相信你需要:
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
from scipy.stats.mstats import gmean
df['gmean'] = (df['price'].rolling('5d')
.apply(gmean, raw=True)
.where(df['price'] > df['top']))
print (df)
price mid std top btm gmean
Date
1999-07-21 8.6912 8.504580 0.084923 9.674425 8.334735 NaN
1999-07-22 8.6978 8.508515 0.092034 8.692583 8.324447 8.694499
1999-07-23 8.8127 8.524605 0.118186 10.760976 8.288234 NaN
1999-07-24 8.8779 8.688810 0.091124 8.871057 8.506563 8.769546
您可以通过取price和top列的差值,然后分配的值来实现这一点。下面是另一个解决方案:
import pandas as pd
from functools import reduce
__name__ = 'RunScript'
ddict = {
'Date':['1999-07-21','1999-07-22','1999-07-23','1999-07-24',],
'price':[8.6912,8.6978,8.8127,8.8779],
'mid':[8.504580,8.508515,8.524605,8.688810],
'std':[0.084923,0.092034,0.118186,0.091124],
'top':[9.674425,8.692583,10.760976,8.871057],
'btm':[8.334735,8.324447,8.288234,8.506563],
}
data = pd.DataFrame(ddict)
def geo_mean(iter):
"""
Geometric mean function. Pass iterable
"""
return reduce(lambda a, b: a * b, iter) ** (1.0 / len(iter))
def set_geo_mean(df):
# Shift the price row down one period
data['shifted price'] = data['price'].shift(periods=1)
# Create a masked expression that evaluates price vs top
masked_expression = df['price'] > df['top']
# Return rows from dataframe where masked expression is true
masked_data = df[masked_expression]
# Apply our function to the relevant rows
df.loc[masked_expression, 'geo_mean'] = geo_mean([masked_data['price'], masked_data['shifted price']])
# Drop the shifted price data column once complete
df.drop('shifted price', axis=1, inplace=True)
if __name__ == 'RunScript':
# Call function and pass dataframe argument.
set_geo_mean(data)
对不起,我刚才把问题简单化了,但我觉得我应该更具体一些。请在编辑后查看我的问题。@JAKE-不确定是否理解您的问题,但如果需要每5天的几何平均数,请使用滚动gmean检查解决方案。抱歉,我刚才简化了问题,但我发现我应该更具体一些。请在编辑后查看我的问题。好的,那么是否可以添加10行预期输出的示例数据?我认为应该从示例数据中删除不必要的列。
import pandas as pd
import numpy as np
df = pd.DataFrame(...)
df['diff'] = df['price'] - df['top']
df.loc[df['diff'] <= 0, 'diff'] = np.NaN # or 0
import pandas as pd
from functools import reduce
__name__ = 'RunScript'
ddict = {
'Date':['1999-07-21','1999-07-22','1999-07-23','1999-07-24',],
'price':[8.6912,8.6978,8.8127,8.8779],
'mid':[8.504580,8.508515,8.524605,8.688810],
'std':[0.084923,0.092034,0.118186,0.091124],
'top':[9.674425,8.692583,10.760976,8.871057],
'btm':[8.334735,8.324447,8.288234,8.506563],
}
data = pd.DataFrame(ddict)
def geo_mean(iter):
"""
Geometric mean function. Pass iterable
"""
return reduce(lambda a, b: a * b, iter) ** (1.0 / len(iter))
def set_geo_mean(df):
# Shift the price row down one period
data['shifted price'] = data['price'].shift(periods=1)
# Create a masked expression that evaluates price vs top
masked_expression = df['price'] > df['top']
# Return rows from dataframe where masked expression is true
masked_data = df[masked_expression]
# Apply our function to the relevant rows
df.loc[masked_expression, 'geo_mean'] = geo_mean([masked_data['price'], masked_data['shifted price']])
# Drop the shifted price data column once complete
df.drop('shifted price', axis=1, inplace=True)
if __name__ == 'RunScript':
# Call function and pass dataframe argument.
set_geo_mean(data)