Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/299.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 利用K近邻进行异常检测?_Python_Scikit Learn_Knn_Outliers - Fatal编程技术网

Python 利用K近邻进行异常检测?

Python 利用K近邻进行异常检测?,python,scikit-learn,knn,outliers,Python,Scikit Learn,Knn,Outliers,我想做一个基于19个特征流量统计的网络入侵检测系统。我已经成功地尝试了单类SVM算法,但听说k个最近邻也可以执行此任务。同样,我有一个无异常的训练数据集和一个测试数据集,其中包含一些异常和相关标签(1表示正常,1表示异常) training_samples.csv(前200个样本,完整文件包含~1200个) 测试样本.csv(第一批100个样本,完整文件包含193个) 测试标签.csv(前100个标签,完整文件包含193个) 我正在使用Scikit Learn的KNeighborsClassif

我想做一个基于19个特征流量统计的网络入侵检测系统。我已经成功地尝试了单类SVM算法,但听说k个最近邻也可以执行此任务。同样,我有一个无异常的训练数据集和一个测试数据集,其中包含一些异常和相关标签(1表示正常,1表示异常)

training_samples.csv(前200个样本,完整文件包含~1200个)

测试样本.csv(第一批100个样本,完整文件包含193个)

测试标签.csv(前100个标签,完整文件包含193个)

我正在使用Scikit Learn的KNeighborsClassifier实现,但所有预测标签都设置为1:

#!/usr/bin/python
import csv, numpy
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler

# To save time and avoid attribute reconstruction, we have prebuilt training and testing files 
# where the attributes are presented under CSV format.
# We just need to convert these files into matrices so they can be used directly as input 
# of the machine learning algorithms.

def csv_attributes_to_matrix(csv_file):
    with open(csv_file, 'r') as data:
        rows = csv.reader(data, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
        return MinMaxScaler().fit_transform(numpy.array(list(rows)))

def csv_labels_to_matrix(csv_file):
    with open(csv_file, 'r') as data:
        rows = csv.reader(data, delimiter=',')
        return [(int(row[0])) for row in rows]

# Create vectors of normal labels for training => all-1-vector
def create_normal_vectors(MATRIX_NORM_length):
    y = list()
    for l in range(MATRIX_NORM_length):
        y.append(1)
    return numpy.array(y)

# Test of KNeighborsClassifier for anomaly detection
def kNN_test(MATRIX_NORM, MATRIX_ANOM, real_labels):
    Y = create_normal_vectors(len(MATRIX_NORM))

    # Parameter grid search    
    for n_neighbors in [1, 2, 3, 5, 10]:
        for weights in ["uniform", "distance"]:
            for algo in ["ball_tree", "kd_tree", "brute"]:
                for p in [1, 5, 10]:
                    for leaf_size in [1, 5, 10] if algo in ["ball_tree", "kd_tree"] else [None]:
                        trained_model = KNeighborsClassifier(n_neighbors, weights, algo, leaf_size, p)
                        trained_model.fit(MATRIX_NORM, Y)
                        predicted_labels = trained_model.predict(MATRIX_ANOM)
                        # Predicted labels are always all set to 1, why ?
                        print (n_neighbors, weights, algo, p, leaf_size), "\n", predicted_labels

# Normal (training) and anomalous (testing) input csv files:
MATRIX_NORM = csv_attributes_to_matrix("training_samples.csv")
MATRIX_ANOM = csv_attributes_to_matrix("testing_samples.csv")
real_labels = csv_labels_to_matrix("testing_labels.csv")

# Launch test
kNN_test(MATRIX_NORM, MATRIX_ANOM, real_labels)
可以使用K最近邻算法(如果不是从sklearn,则从另一个库中选择一个)来执行新颖性/离群值检测?

我会给出一个读数。特别是,

对象由其邻居的多数投票进行分类,对象被分配到其k个最近邻居中最常见的类别(k是一个正整数,通常很小)

create_normal_vectors
中的训练集表示每个点都是“正常”的,因此当未标记的点询问其邻居它所属的类别时,每个点都将投票为“类别1”


您可以使用不同的方法查看文档。

感谢您对kNN的澄清。我已经尝试过一类SVM,它工作得很好,还有EllipticeDevelope函数,它不工作,因为我们的数据不是高斯分布的。因此,我们的想法是使用另一种算法,如kNN,来检测异常值。但我看到sklearn的KNeighborsClassifier没有被改编,也许我应该使用另一个库中的kNN实现,并在必要时对其进行调优。
7, 3, 2, 1, 2, 0, 2, 5, 5, 1, 1, 106, 250.571428571, 52252.244898, 612, 0, 0, 0, 0
7, 3, 2, 1, 2, 0, 3, 4, 4, 1, 1, 106, 322.857142857, 62702.6938776, 612, 0, 0, 0, 0
6, 3, 2, 1, 2, 0, 3, 3, 3, 1, 1, 106, 359.0, 64009.0, 612, 0, 0, 0, 0
7, 3, 2, 1, 2, 0, 2, 5, 5, 1, 1, 106, 250.571428571, 52252.244898, 612, 0, 0, 0, 0
7, 3, 2, 1, 2, 0, 2, 5, 5, 1, 1, 106, 250.571428571, 52252.244898, 612, 0, 0, 0, 0
7, 3, 2, 1, 2, 0, 4, 3, 4, 1, 1, 106, 395.142857143, 62702.6938776, 612, 0, 0, 0, 0
6, 3, 2, 1, 2, 0, 2, 4, 4, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
1272, 3, 2, 2, 3, 0, 2, 5, 5, 1, 1, 42, 43.572327044, 532.118982635, 612, 1205, 0, 0, 0
5664, 1, 1, 2, 2, 0, 0, 5, 5, 1, 1, 42, 42.113700565, 4.63255240751, 106, 5623, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
18, 4, 3, 2, 2, 0, 6, 8, 8, 1, 1, 92, 271.555555556, 57980.2469136, 612, 0, 0, 0, 0
12, 4, 3, 2, 2, 0, 4, 6, 6, 1, 1, 92, 272.333333333, 57711.2222222, 612, 0, 0, 0, 0
12, 4, 3, 2, 2, 0, 3, 7, 7, 1, 1, 92, 230.166666667, 48624.3055556, 612, 0, 0, 0, 0
18, 4, 3, 2, 2, 0, 6, 8, 8, 1, 1, 92, 271.555555556, 57980.2469136, 612, 0, 0, 0, 0
14, 4, 3, 2, 2, 0, 4, 7, 7, 1, 1, 92, 247.571428571, 53152.6734694, 612, 0, 0, 0, 0
1660, 3, 3, 3, 2, 174, 1652, 1652, 1652, 1, 1, 57, 57.2108433735, 9.40132820438, 106, 0, 0, 0, 0
190, 5, 4, 3, 3, 24, 180, 176, 176, 1, 1, 57, 70.9684210526, 6391.23058172, 612, 0, 0, 0, 0
14, 4, 3, 2, 2, 0, 4, 8, 8, 1, 1, 92, 248.571428571, 52854.5306122, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
14, 3, 2, 1, 2, 0, 5, 9, 9, 1, 1, 106, 286.714285714, 58783.7755102, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
40, 2, 2, 2, 2, 0, 33, 33, 7, 1, 17, 64, 71.35, 254.6775, 106, 0, 0, 0, 0
18, 4, 3, 2, 3, 0, 11, 7, 7, 1, 4, 64, 202.111111111, 48345.5432099, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
14, 3, 2, 1, 2, 0, 7, 7, 7, 1, 1, 106, 359.0, 64009.0, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
242, 3, 3, 2, 3, 238, 1, 3, 3, 1, 1, 106, 430.669421488, 1453.51881702, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
14, 4, 3, 2, 2, 0, 4, 8, 8, 1, 1, 106, 270.142857143, 48891.5510204, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 6, 6, 6, 1, 1, 106, 359.0, 64009.0, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
34, 4, 6, 2, 4, 0, 13, 8, 6, 1, 1, 42, 144.882352941, 38046.633218, 612, 1, 0, 0, 0
138, 11, 21, 2, 3, 0, 56, 38, 18, 1, 1, 42, 56.768115942, 186.323041378, 106, 1, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 3, 7, 7, 1, 1, 106, 257.8, 53767.56, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 6, 6, 6, 1, 1, 106, 359.0, 64009.0, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
20, 4, 4, 2, 3, 0, 4, 9, 7, 1, 1, 102, 205.4, 41334.04, 612, 0, 0, 0, 0
576, 4, 4, 2, 3, 0, 4, 565, 283, 1, 1, 102, 105.590277778, 1793.55434992, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
14, 3, 2, 1, 2, 0, 6, 8, 8, 1, 1, 106, 322.857142857, 62702.6938776, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
14, 3, 2, 1, 2, 0, 5, 9, 9, 1, 1, 106, 286.714285714, 58783.7755102, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
14, 3, 2, 1, 3, 0, 6, 8, 8, 1, 1, 106, 322.857142857, 62702.6938776, 612, 0, 0, 0, 0
14, 4, 3, 2, 3, 0, 6, 8, 8, 1, 2, 60, 171.428571429, 32598.5306122, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 3, 7, 7, 1, 1, 106, 257.8, 53767.56, 612, 0, 0, 0, 0
12, 2, 2, 1, 2, 0, 3, 9, 9, 1, 1, 106, 232.5, 48006.75, 612, 0, 0, 0, 0
14, 4, 3, 2, 3, 0, 6, 8, 8, 1, 2, 60, 172.285714286, 32487.3469388, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 3, 7, 7, 1, 1, 106, 257.8, 53767.56, 612, 0, 0, 0, 0
14, 3, 3, 2, 3, 0, 4, 6, 6, 1, 1, 42, 162.857142857, 34231.8367347, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 2, 60, 182.666666667, 37147.5555556, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 3, 7, 7, 1, 1, 106, 257.8, 53767.56, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
12, 4, 3, 2, 3, 0, 5, 7, 7, 1, 2, 60, 185.833333333, 36478.3055556, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 3, 7, 7, 1, 2, 60, 202.6, 42087.24, 612, 0, 0, 0, 0
10, 2, 2, 1, 2, 0, 2, 8, 8, 1, 1, 106, 207.2, 40965.76, 612, 0, 0, 0, 0
14, 3, 2, 1, 3, 0, 6, 8, 8, 1, 1, 106, 322.857142857, 62702.6938776, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 6, 6, 6, 1, 1, 106, 359.0, 64009.0, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
32, 3, 3, 2, 3, 0, 21, 20, 7, 1, 10, 42, 74.9375, 9984.49609375, 612, 0, 0, 0, 0
12, 3, 2, 1, 3, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 5, 7, 7, 1, 1, 106, 316.833333333, 62230.9722222, 612, 0, 0, 0, 0
12, 3, 2, 1, 2, 0, 4, 8, 8, 1, 1, 106, 274.666666667, 56896.8888889, 612, 0, 0, 0, 0
10, 3, 2, 1, 2, 0, 4, 6, 6, 1, 1, 106, 308.4, 61448.64, 612, 0, 0, 0, 0
1
1
1
1
1
1
1
-1
-1
1
1
1
1
1
1
1
1
1
-1
-1
1
1
1
1
1
1
1
1
-1
-1
1
1
1
1
1
1
1
1
-1
1
1
1
1
1
1
1
1
-1
-1
1
1
1
1
1
1
1
1
-1
-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
#!/usr/bin/python
import csv, numpy
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import MinMaxScaler

# To save time and avoid attribute reconstruction, we have prebuilt training and testing files 
# where the attributes are presented under CSV format.
# We just need to convert these files into matrices so they can be used directly as input 
# of the machine learning algorithms.

def csv_attributes_to_matrix(csv_file):
    with open(csv_file, 'r') as data:
        rows = csv.reader(data, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
        return MinMaxScaler().fit_transform(numpy.array(list(rows)))

def csv_labels_to_matrix(csv_file):
    with open(csv_file, 'r') as data:
        rows = csv.reader(data, delimiter=',')
        return [(int(row[0])) for row in rows]

# Create vectors of normal labels for training => all-1-vector
def create_normal_vectors(MATRIX_NORM_length):
    y = list()
    for l in range(MATRIX_NORM_length):
        y.append(1)
    return numpy.array(y)

# Test of KNeighborsClassifier for anomaly detection
def kNN_test(MATRIX_NORM, MATRIX_ANOM, real_labels):
    Y = create_normal_vectors(len(MATRIX_NORM))

    # Parameter grid search    
    for n_neighbors in [1, 2, 3, 5, 10]:
        for weights in ["uniform", "distance"]:
            for algo in ["ball_tree", "kd_tree", "brute"]:
                for p in [1, 5, 10]:
                    for leaf_size in [1, 5, 10] if algo in ["ball_tree", "kd_tree"] else [None]:
                        trained_model = KNeighborsClassifier(n_neighbors, weights, algo, leaf_size, p)
                        trained_model.fit(MATRIX_NORM, Y)
                        predicted_labels = trained_model.predict(MATRIX_ANOM)
                        # Predicted labels are always all set to 1, why ?
                        print (n_neighbors, weights, algo, p, leaf_size), "\n", predicted_labels

# Normal (training) and anomalous (testing) input csv files:
MATRIX_NORM = csv_attributes_to_matrix("training_samples.csv")
MATRIX_ANOM = csv_attributes_to_matrix("testing_samples.csv")
real_labels = csv_labels_to_matrix("testing_labels.csv")

# Launch test
kNN_test(MATRIX_NORM, MATRIX_ANOM, real_labels)