Python 如何减少神经网络中的误报？_Python_Keras_Neural Network

Python 如何减少神经网络中的误报？

python keras neural-network

Python 如何减少神经网络中的误报？,python,keras,neural-network,Python,Keras,Neural Network,我正在使用此数据集：我试图训练一个预测欺诈的神经网络，它基于一个叫做“欺诈”的变量（一个虚拟变量，1代表欺诈，0代表非欺诈）结果是，我的模型正确地预测了欺诈（8429），但无法预测欺诈（68），当我打印混淆矩阵时，68次欺诈被视为假阴性。这是我的密码 import tensorflow as tf from tensorflow import keras from sklearn.metrics import confusion_matrix from sklearn.metrics imp

我正在使用此数据集：

我试图训练一个预测欺诈的神经网络，它基于一个叫做“欺诈”的变量（一个虚拟变量，1代表欺诈，0代表非欺诈）

结果是，我的模型正确地预测了欺诈（8429），但无法预测欺诈（68），当我打印混淆矩阵时，68次欺诈被视为假阴性。这是我的密码

import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import confusion_matrix
from sklearn.metrics import f1_score
from random import randint

y=dataset.pop('fraud') # creating a 'y' variable, equal to the dummy variable
dataset = dataset.drop(['cust_id','activated_date','last_payment_date','Unnamed: 0'], axis=1) # droping the not usefull variables for my model
x=dataset.values # a new 'x' variable with the remaining values of dataset 
y=np.array(y) # converting 'y' values to numpy array
counter=(np.count_nonzero(y)) # counter it's a variable that uses numpy.nonzero to count all zero values (to count not fraudulent users)
print("Data from not fraudulent users: " + str(len(y)-counter))
print("Data from fraudulent users: " + str(counter))
print("percetnage of not fraudulent: " +str(100*(1-(counter/len(y)))))
print("percetnage of fraudulent: "+str(100*((counter/len(y)))))

训练数据集的过采样：

from imblearn.over_sampling import RandomOverSampler

oversample = RandomOverSampler(sampling_strategy='minority')
x_over, y_over = oversample.fit_resample(x, y)
counter=(np.count_nonzero(y_over))
print("Amount of data that is not fraudulent " + str(len(y_over)-counter))
print("Amount that indeed is fraudulent " + str(counter))
print("Percentage of non fraud " +str(100*(1-(counter/len(y_over)))))
print("Percentage of fraud "+str(100*((counter/len(y_over)))))

训练我的神经网络：

epochs=90
model = keras.Sequential([
    keras.layers.Dense(32,input_shape=(17,),activation='relu'), # first layer with 17 variables for input, and 32 nodes
    keras.layers.Dense(64, activation='relu'), #2nd layer with 64 nodes and lineal activation method
    keras.layers.Dense(32,activation='relu'), # 3rd layer with 32 nodes and linear activation
    keras.layers.Dense(2, activation='sigmoid') # 4th layer with only two nodes and sigmoid classification (logistic) for 1 or 0 prediciton
])
print(model.summary())

# First, I need to compile the model with Adam optimizer
model.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_over, y_over, epochs=epochs,validation_split=0.25,verbose=1) # fit training data set on our neural network

编译我的神经网络：

epochs=90
model = keras.Sequential([
    keras.layers.Dense(32,input_shape=(17,),activation='relu'), # first layer with 17 variables for input, and 32 nodes
    keras.layers.Dense(64, activation='relu'), #2nd layer with 64 nodes and lineal activation method
    keras.layers.Dense(32,activation='relu'), # 3rd layer with 32 nodes and linear activation
    keras.layers.Dense(2, activation='sigmoid') # 4th layer with only two nodes and sigmoid classification (logistic) for 1 or 0 prediciton
])
print(model.summary())

# First, I need to compile the model with Adam optimizer
model.compile(optimizer='SGD',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

model.fit(x_over, y_over, epochs=epochs,validation_split=0.25,verbose=1) # fit training data set on our neural network

最后，我的困惑矩阵：

comp=model.predict(x)
comp=np.array([np.argmax(u) for u in comp])
cm = confusion_matrix(y_true=y, y_pred=comp)
print(cm)
[[8429    0]
 [  68    0]]

为了增强我的模型并使68 FN正确分类，我可以做些什么？

这里的问题是，您的数据集极不平衡，因此需要应用处理此类数据集的方法。看看这个例子：。我认为它实际上使用了完全相同的数据集，因此非常方便。嗨，你是否以某种方式对数据进行了规范化？@Mustafaydın不，我没有告诉你，在转向神经网络之前，你尝试过普通的XGBoost/Catboost模型？不，它应该是2个带softmax的神经元，或者1个带sigmoid的神经元@马尔科里亚尼指出了这一点（每种情况下的损失函数都不同）