Python 为回溯测试和机器学习指定测试行

Python 为回溯测试和机器学习指定测试行,python,Python,我想用机器学习来预测资产的价格变动。到目前为止,我得到了数据和结果。现在我想对模型进行反向测试。前提非常简单:只要在预测值为1时买入并持有即可。我想应用预测模型,从下到上迭代测试行,检查预测输出是否匹配相应的标签(这里的标签是-1,1),然后进行一些计算 代码如下: def backtest(): x = df[['open', 'high', 'low', 'close', 'vol']] y = df['label'] z = np.array(df['log_ret

我想用机器学习来预测资产的价格变动。到目前为止,我得到了数据和结果。现在我想对模型进行反向测试。前提非常简单:只要在预测值为1时买入并持有即可。我想应用预测模型,从下到上迭代测试行,检查预测输出是否匹配相应的标签(这里的标签是-1,1),然后进行一些计算

代码如下:

def backtest():
    x = df[['open', 'high', 'low', 'close', 'vol']]
    y = df['label']
    z = np.array(df['log_ret'].values)

test_size = 366
rf = RandomForestClassifier(n_estimators = 100)
rf.fit(x[:-test_size],y[:-test_size])

invest_amount = 1000
trade_qty = 0
correct_count = 0

for i in range(1, test_size):
    if rf.predict(x[-i])[0] == y[-i]:
    correct_count += 1

if rf.predict(x[-i])[0] == 1:
    invest_return = invest_amount + (invest_amount * (z[-i]/100))
    trade_qty += 1


print('accuracy:', (correct_count/test_size)*100)
print('total trades:', trade_qty)
print('profits:', invest_return)

backtest()
到目前为止,我一直坚持这一点:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: -1

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-29-feab89792f26> in <module>
     22 
     23 for i in range(1, test_size):
---> 24     if rf.predict(x[-i])[0] == y[-i]:
     25         correct_count += 1
     26 

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: -1
---------------------------------------------------------------------------
KeyError回溯(最近一次呼叫最后一次)
get\u loc中的~\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
2645请尝试:
->2646返回自引擎。获取位置(钥匙)
2647键错误除外:
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
键错误:-1
在处理上述异常期间,发生了另一个异常:
KeyError回溯(最近一次呼叫最后一次)
在里面
22
23适用于范围内的i(1,测试尺寸):
--->24如果rf.predict(x[-i])[0]==y[-i]:
25正确计数+=1
26
~\anaconda3\lib\site packages\pandas\core\frame.py in\uuuu\getitem\uuuuuuu(self,key)
2798如果self.columns.nlevels>1:
2799返回自我。\u获取项目\u多级(键)
->2800索引器=self.columns.get_loc(键)
2801如果是_整数(索引器):
2802索引器=[索引器]
get\u loc中的~\anaconda3\lib\site packages\pandas\core\index\base.py(self、key、method、tolerance)
2646返回自引擎。获取位置(钥匙)
2647键错误除外:
->2648返回self.\u引擎。获取self.\u loc(self.\u可能\u cast\u索引器(键))
2649 indexer=self.get\u indexer([key],method=method,tolerance=tolerance)
2650如果indexer.ndim>1或indexer.size>1:
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
熊猫\\u libs\index.pyx在熊猫中。\ u libs.index.IndexEngine.get_loc()
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
pandas\\u libs\hashtable\u class\u helper.pxi在pandas.\u libs.hashtable.PyObjectHashTable.get\u item()中
键错误:-1

下面的代码通过一些修改解决了问题:

def backtest():
    x = df[['open', 'high', 'low', 'close', 'vol']]
    y = df['label']
    z = np.array(df['log_ret'].values)

    test_size = 366
    rf = RandomForestClassifier(n_estimators = 100)
    rf.fit(x[:-test_size],y[:-test_size])

    invest_amount = 1000
    trade_qty = 0
    correct_count = 0

    for i in range(1, test_size)[::-1]:
        if rf.predict(x[x.index == i])[0] == y[i]:
            correct_count += 1

        if rf.predict(x[x.index == i])[0] == 1:
            invest_return = invest_amount + (invest_amount * (z[i]/100))
            trade_qty += 1

    print('accuracy:', (correct_count/test_size)*100)
    print('total trades:', trade_qty)
    print('profits:', invest_return)

backtest()
解释修改:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

data = {'open': np.random.rand(1000), 
        'high': np.random.rand(1000), 
        'low': np.random.rand(1000), 
        'close': np.random.rand(1000), 
        'vol': np.random.rand(1000),
        'log_ret': np.random.rand(1000),
        'label': np.random.choice([-1,1], 1000)}

df = pd.DataFrame(data)
  • 通过过滤索引
    x[x.index]访问数据帧行==
    i] 
  • 修改反向范围的负索引,具有较少的自适应
    范围(1,测试大小)[::-1]
    生成测试用例:

    import numpy as np
    import pandas as pd
    from sklearn.ensemble import RandomForestClassifier
    
    data = {'open': np.random.rand(1000), 
            'high': np.random.rand(1000), 
            'low': np.random.rand(1000), 
            'close': np.random.rand(1000), 
            'vol': np.random.rand(1000),
            'log_ret': np.random.rand(1000),
            'label': np.random.choice([-1,1], 1000)}
    
    df = pd.DataFrame(data)
    
    这将产生以下结果:

    >> backtest()
    accuracy: 99.72677595628416
    total trades: 181
    profits: 1006.8351193358026
    

    问题中的代码是否正确识别?如果没有,你能纠正一下吗?好的,我只是做了一些编辑。我想现在可以了吧?请让我知道。我对格式不是很熟悉,它似乎仍然不正确,但我认为我能够得到它。