Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/320.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 滚动stdev以使用NAN删除异常值_Python_Pandas_Dataframe_Nan_Rolling Computation - Fatal编程技术网

Python 滚动stdev以使用NAN删除异常值

Python 滚动stdev以使用NAN删除异常值,python,pandas,dataframe,nan,rolling-computation,Python,Pandas,Dataframe,Nan,Rolling Computation,是的,所以我对python有点生疏(4年后退出),我一直在寻找解决这个问题的方法。虽然有类似的线索,但我无法找出我做错了什么 我有一些数据如下所示: print (fwds) 1y1yUSD 1y1yEUR 1y1yAUD 1y1yCAD 1y1yCHF 1y1yGBP \ Date 2019-10-15 1.47518

是的,所以我对python有点生疏(4年后退出),我一直在寻找解决这个问题的方法。虽然有类似的线索,但我无法找出我做错了什么

我有一些数据如下所示:

print (fwds)
        1y1yUSD   1y1yEUR    1y1yAUD  1y1yCAD   1y1yCHF   1y1yGBP  \
Date                                                                    
2019-10-15  1.47518 -0.503679   0.681473  1.84996 -0.804212  0.626394   
2019-10-14      NaN -0.513647   0.684232      NaN -0.815201  0.643280   
2019-10-11  1.51515 -0.520474   0.654544  1.84918 -0.812819  0.697584   
2019-10-10  1.39085 -0.538651   0.564055  1.72812 -0.846291  0.546696   
2019-10-09  1.30827 -0.568942   0.564897  1.63652 -0.896871  0.479307   
...             ...       ...        ...      ...       ...       ...   
1995-01-09  8.59473       NaN  10.830200  9.59729       NaN  9.407250   
1995-01-06  8.58316       NaN  10.851200  9.42043       NaN  9.434480   
1995-01-05  8.56470       NaN  10.839000  9.51209       NaN  9.560490   
1995-01-04  8.44306       NaN  10.745900  9.51142       NaN  9.507650   
1995-01-03  8.58847       NaN        NaN  9.38380       NaN  9.611590   
问题是数据质量不是很好,我需要在滚动的基础上删除异常值(因为这些时间序列一直在趋势化,使用静态ZS将不起作用)

我尝试了一些解决方案。一个是尝试获得一个滚动的zscore,然后过滤大的zscore。但是,当我尝试计算zscore时,我的结果都是NaN:

def zscore(x, window):
    r = x.rolling(window=window)
    m = r.mean().shift(1)
    s = r.std(ddof=0, skipna=True).shift(1)
    z = (x-m)/s
    return z
print (fwds)
print (zscore(fwds, 200))
        1y1yUSD  1y1yEUR  1y1yAUD  1y1yCAD  1y1yCHF  1y1yGBP  1y1yJPY  \
Date                                                                        
2019-10-15      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-10-14      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-10-11      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-10-10      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
2019-10-09      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
...             ...      ...      ...      ...      ...      ...      ...   
1995-01-09      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1995-01-06      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1995-01-05      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1995-01-04      NaN      NaN      NaN      NaN      NaN      NaN      NaN   
1995-01-03      NaN      NaN      NaN      NaN      NaN      NaN      NaN 
另一种方法:

r = fwds.rolling(window=200)
large = r.mean() + 4 * r.std()
small = r.mean() - 4 * r.std()
print(fwds[fwds > mps])
print (fwds[fwds < mps])
也适用于最大值和最小值。在计算滚动的stdev或zscore时,有人知道如何处理这些该死的问题吗

任何提示,不胜感激。谢谢

编辑: 为了进一步澄清,我希望有系统地从图表中删除绿色和棕色线的尖峰:

fwds.plot()

下面的链接:

欢迎使用堆栈溢出。。。。根据您的用例(以及有多少疯狂的极值),数据插值应该符合要求

因为你在展望未来(我认为),除非你的一些缺失值是市场大规模破坏的结果,否则插值应该在统计上是合理的

您可以使用pandas的DataFrame.interpolate用插值填充NaN值

通过线性插值在序列中填充NaN


编辑我刚刚意识到您正在寻找市场错位,因此您可能不想使用线性插值,因为这样会减弱缺失数据的影响

谢谢!我想我可以插一句。不是真的寻找市场混乱,只是尝试测试一个简单的趋势跟踪策略。我认为插值是不现实的。我所做的只是逐列对其进行了回溯测试,并删除了所有的NAN。我想这更为实际/准确,因为NAN的日期可能无法交易。对于任何好奇的人,回溯测试结果:5y5y,25对100 sma x-over。很简单,只是习惯了和熊猫玩耍。
fwds.plot()
>>> s = pd.Series([0, 1, np.nan, 3])
>>> s
0    0.0
1    1.0
2    NaN
3    3.0
dtype: float64
>>> s.interpolate()
0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64