Python 运行模型后，如何将隔离林和局部异常值因子保存为两个不同的模型？_Python_Pandas_Numpy_Machine Learning_Sklearn Pandas

Python 运行模型后，如何将隔离林和局部异常值因子保存为两个不同的模型？

python pandas numpy machine-learning

Python 运行模型后，如何将隔离林和局部异常值因子保存为两个不同的模型？,python,pandas,numpy,machine-learning,sklearn-pandas,Python,Pandas,Numpy,Machine Learning,Sklearn Pandas,我一直在尝试编写一个机器学习程序，使用sklearn和pandas中的隔离林和局部离群因子方法来检测信用卡欺诈我运行了代码并进行了预测，但我不知道如何将它们保存为不同的模型。我一直在遵循一些例子，但不知道在哪里以及如何保存它。我认为它类似于.save（'Isolation.h5'）和.save（'Outlier.h5'），但我不确定在.save前面放什么如果有人能帮助我了解如何保存每个模型，我将不胜感激我当前的代码： import numpy import pandas import ma

我一直在尝试编写一个机器学习程序，使用

sklearn

和

pandas

中的隔离林和局部离群因子方法来检测信用卡欺诈

我运行了代码并进行了预测，但我不知道如何将它们保存为不同的模型。我一直在遵循一些例子，但不知道在哪里以及如何保存它。我认为它类似于

.save（'Isolation.h5'）

和

.save（'Outlier.h5'）

，但我不确定在

.save

前面放什么

如果有人能帮助我了解如何保存每个模型，我将不胜感激

我当前的代码：

import numpy
import pandas
import matplotlib
import seaborn
import scipy

# import the necessary packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset from the csv file using pandas
data = pd.read_csv('C:/Users/super/OneDrive/Documents/School/Spring 2020/CS 657/Final Project/creditcard.csv')

# Start exploring the dataset
print(data.columns)

data = data.sample(frac=0.1, random_state = 1)
print(data.shape)
print(data.describe())

# V1 - V28 are the results of a PCA Dimensionality reduction to protect user identities and sensitive features

# Plot histograms of each parameter 
data.hist(figsize = (20, 20))
plt.show()

# Determine number of fraud cases in dataset

Fraud = data[data['Class'] == 1]
Valid = data[data['Class'] == 0]

outlier_fraction = len(Fraud)/float(len(Valid))
print(outlier_fraction)

print('Fraud Cases: {}'.format(len(data[data['Class'] == 1])))
print('Valid Transactions: {}'.format(len(data[data['Class'] == 0])))

# Correlation matrix
corrmat = data.corr()
fig = plt.figure(figsize = (12, 9))

sns.heatmap(corrmat, vmax = .8, square = True)
plt.show()

# Get all the columns from the dataFrame
columns = data.columns.tolist()

# Filter the columns to remove data we do not want
columns = [c for c in columns if c not in ["Class"]]

# Store the variable we'll be predicting on
target = "Class"

X = data[columns]
Y = data[target]

# Print shapes
print(X.shape)
print(Y.shape)

from sklearn.metrics import classification_report, accuracy_score
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor

# define random states
state = 1

# define outlier detection tools to be compared
classifiers = {
    "Isolation Forest": IsolationForest(max_samples=len(X),
                                        contamination=outlier_fraction,
                                        random_state=state),
    "Local Outlier Factor": LocalOutlierFactor(
        n_neighbors=20,
        contamination=outlier_fraction)}

# Fit the model
plt.figure(figsize=(9, 7))
n_outliers = len(Fraud)


for i, (clf_name, clf) in enumerate(classifiers.items()):

    # fit the data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)

    # Reshape the prediction values to 0 for valid, 1 for fraud. 
    y_pred[y_pred == 1] = 0
    y_pred[y_pred == -1] = 1

    n_errors = (y_pred != Y).sum()

    # Run classification metrics
    print('{}: {}'.format(clf_name, n_errors))
    print(accuracy_score(Y, y_pred))
    print(classification_report(Y, y_pred))

由于您循环所有分类器并训练它们/进行预测，因此您可以简单地同时保存模型

例如，使用

pickle

：

import pickle

def save_model(clf, filename):
    with open(filename, 'wb') as f:
        pickle.dump(clf, f)

for i, (clf_name, clf) in enumerate(classifiers.items()):

    # fit the data and tag outliers
    if clf_name == "Local Outlier Factor":
        y_pred = clf.fit_predict(X)
        scores_pred = clf.negative_outlier_factor_
        save_model(clf, 'Outlier.pkl')  # Saving the LOF
    else:
        clf.fit(X)
        scores_pred = clf.decision_function(X)
        y_pred = clf.predict(X)
        save_model(clf, 'Isolation.pkl')  # Saving the isolation forest

    ...

然后，可以使用以下方法加载模型：

def load_model(filename):
    with open(filename, 'rb') as f:
        clf = pickle.load(f)
    return clf

您也可以保存为另一种格式，其思想与所使用的软件包完全相同