Python sklearn OneClassSVM只给出误报
我正在编写一个简单的API来输出一维数据集中的异常值。但出于某种原因,当数据使人认为它应该做相反的事情时,sklearn.svm.OneClassSVM只输出误报。下面是一个有问题的例子:Python sklearn OneClassSVM只给出误报,python,numpy,scikit-learn,svm,pyodbc,Python,Numpy,Scikit Learn,Svm,Pyodbc,我正在编写一个简单的API来输出一维数据集中的异常值。但出于某种原因,当数据使人认为它应该做相反的事情时,sklearn.svm.OneClassSVM只输出误报。下面是一个有问题的例子: select valor, count(*) as no_occurrences from medicao_eta where dt_registro_programada between '2020-10-01 00:00:00:000' and '2020-10-13 23:59:59:997' and
select valor, count(*) as no_occurrences
from medicao_eta
where dt_registro_programada between '2020-10-01 00:00:00:000' and '2020-10-13 23:59:59:997'
and cod_configuracao = 1331
group by valor
order by count(*) desc
这给了我293条记录,只有两个值:
+--------+----------------+
| valor | no_occurrences |
+--------+----------------+
| 1.3220 | 292 |
| 1.3200 | 1 |
+--------+----------------+
python代码:
def why_on_earth():
import datetime
import numpy as np
from sklearn.svm import OneClassSVM
db = DbUtil()
unidade_operacional = db.fetch_unidade_operacional(73)
test_data = db.fetch_medicoes(unidade_operacional=unidade_operacional,
data_inicio=datetime.datetime(2020,10,1),
data_fim=datetime.datetime(2020,10,13),
cod_configuracao=1331)
data = [row.valor for row in test_data[1331]]
x_numpy = np.asarray(data)
x = x_numpy.reshape(-1, 1)
svm = OneClassSVM(kernel='rbf', gamma='scale', nu=0.001)
pred = svm.fit_predict(x)
indexes = [i for i, x in enumerate(pred) if x == -1]
outliers = [test_data[1331][i] for i in indexes]
return outliers
我已经检查了流入算法的数据一千次了。我看到293个小数组成了一个二维数组,比如[[1.322],[1.322],[1.322]…]
。该算法不应该只输出一个异常值(1.3200)吗?pred数组只包含-1,这根本没有意义。我真的被困在这里了
以下是数据:
提前谢谢