Python 如何减少神经网络中的误报?

Python 如何减少神经网络中的误报?,python,keras,neural-network,Python,Keras,Neural Network,我正在使用此数据集: 我试图训练一个预测欺诈的神经网络,它基于一个叫做“欺诈”的变量(一个虚拟变量,1代表欺诈,0代表非欺诈) 结果是,我的模型正确地预测了欺诈(8429),但无法预测欺诈(68),当我打印混淆矩阵时,68次欺诈被视为假阴性。这是我的密码 import tensorflow as tf from tensorflow import keras from sklearn.metrics import confusion_matrix from sklearn.metrics imp

我正在使用此数据集:

我试图训练一个预测欺诈的神经网络,它基于一个叫做“欺诈”的变量(一个虚拟变量,1代表欺诈,0代表非欺诈)

结果是,我的模型正确地预测了欺诈(8429),但无法预测欺诈(68),当我打印混淆矩阵时,68次欺诈被视为假阴性。这是我的密码

import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from random import randint

y=dataset.pop('fraud') # creating a 'y' variable, equal to the dummy variable
dataset = dataset.drop(['cust_id','activated_date','last_payment_date','Unnamed: 0'], axis=1) # droping the not usefull variables for my model
x=dataset.values # a new 'x' variable with the remaining values of dataset 
y=np.array(y) # converting 'y' values to numpy array
counter=(np.count_nonzero(y)) # counter it's a variable that uses numpy.nonzero to count all zero values (to count not fraudulent users)
print("Data from not fraudulent users: " + str(len(y)-counter))
print("Data from fraudulent users: " + str(counter))
print("percetnage of not fraudulent: " +str(100*(1-(counter/len(y)))))
print("percetnage of fraudulent: "+str(100*((counter/len(y)))))

训练数据集的过采样:

from imblearn.over_sampling import RandomOverSampler

oversample = RandomOverSampler(sampling_strategy='minority')
x_over, y_over = oversample.fit_resample(x, y)
counter=(np.count_nonzero(y_over))
print("Amount of data that is not fraudulent " + str(len(y_over)-counter))
print("Amount that indeed is fraudulent " + str(counter))
print("Percentage of non fraud " +str(100*(1-(counter/len(y_over)))))
print("Percentage of fraud "+str(100*((counter/len(y_over)))))
训练我的神经网络:

epochs=90
model = keras.Sequential([
    keras.layers.Dense(32,input_shape=(17,),activation='relu'), # first layer with 17 variables for input, and 32 nodes
    keras.layers.Dense(64, activation='relu'), #2nd layer with 64 nodes and lineal activation method
    keras.layers.Dense(32,activation='relu'), # 3rd layer with 32 nodes and linear activation
    keras.layers.Dense(2, activation='sigmoid') # 4th layer with only two nodes and sigmoid classification (logistic) for 1 or 0 prediciton
])
print(model.summary())
# First, I need to compile the model with Adam optimizer
model.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_over, y_over, epochs=epochs,validation_split=0.25,verbose=1) # fit training data set on our neural network
编译我的神经网络:

epochs=90
model = keras.Sequential([
    keras.layers.Dense(32,input_shape=(17,),activation='relu'), # first layer with 17 variables for input, and 32 nodes
    keras.layers.Dense(64, activation='relu'), #2nd layer with 64 nodes and lineal activation method
    keras.layers.Dense(32,activation='relu'), # 3rd layer with 32 nodes and linear activation
    keras.layers.Dense(2, activation='sigmoid') # 4th layer with only two nodes and sigmoid classification (logistic) for 1 or 0 prediciton
])
print(model.summary())
# First, I need to compile the model with Adam optimizer
model.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_over, y_over, epochs=epochs,validation_split=0.25,verbose=1) # fit training data set on our neural network
最后,我的困惑矩阵:

comp=model.predict(x)
comp=np.array([np.argmax(u) for u in comp])
cm = confusion_matrix(y_true=y, y_pred=comp)
print(cm)
[[8429    0]
 [  68    0]]

为了增强我的模型并使68 FN正确分类,我可以做些什么?

这里的问题是,您的数据集极不平衡,因此需要应用处理此类数据集的方法。看看这个例子:。我认为它实际上使用了完全相同的数据集,因此非常方便。嗨,你是否以某种方式对数据进行了规范化?@Mustafaydın不,我没有告诉你,在转向神经网络之前,你尝试过普通的XGBoost/Catboost模型?不,它应该是2个带softmax的神经元,或者1个带sigmoid的神经元@马尔科里亚尼指出了这一点(每种情况下的损失函数都不同)