Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/293.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 梯度下降人工神经网络——MATLAB在做什么;我不是?_Python_Matlab_Machine Learning_Neural Network_Gradient Descent - Fatal编程技术网

Python 梯度下降人工神经网络——MATLAB在做什么;我不是?

Python 梯度下降人工神经网络——MATLAB在做什么;我不是?,python,matlab,machine-learning,neural-network,gradient-descent,Python,Matlab,Machine Learning,Neural Network,Gradient Descent,我正在尝试使用梯度下降反向传播在Python中重建一个简单的MLP人工神经网络。我的目标是尝试重现MATLAB的ANN所产生的精度,但我还没有接近。我使用与MATLAB相同的参数;相同数量的隐藏节点(20个)、1000个历元、0.01的学习率(alpha)和相同的数据(很明显),但我的代码在改进结果方面没有取得任何进展,而MATLAB的精确度在98%左右 我试图通过MATLAB进行调试,看看它在做什么,但运气不太好。我相信MATLAB会在0和1之间缩放输入数据,并向输入添加偏差,我在Python

我正在尝试使用梯度下降反向传播在Python中重建一个简单的MLP人工神经网络。我的目标是尝试重现MATLAB的ANN所产生的精度,但我还没有接近。我使用与MATLAB相同的参数;相同数量的隐藏节点(20个)、1000个历元、0.01的学习率(alpha)和相同的数据(很明显),但我的代码在改进结果方面没有取得任何进展,而MATLAB的精确度在98%左右

我试图通过MATLAB进行调试,看看它在做什么,但运气不太好。我相信MATLAB会在0和1之间缩放输入数据,并向输入添加偏差,我在Python代码中使用了这两种方法

MATLAB在做什么,从而产生如此高的结果?或者,更可能的是,我在Python代码中犯了什么错误,导致了如此糟糕的结果?我所能想到的就是权重启动不良、数据读取错误、数据处理操作错误或激活功能不正确/较差(我也尝试过tanh,结果相同)

下面是我的尝试,基于我在网上找到的代码,并稍微调整以读取我的数据,而MATLAB脚本(只有11行代码)就在下面。底部是我使用的数据集的链接(我也是通过MATLAB获得的):

谢谢你的帮助

Main.py

import numpy as np
import Process
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelBinarizer
import warnings


def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))


def sigmoid_prime(x):
    return sigmoid(x)*(1.0-sigmoid(x))


class NeuralNetwork:

    def __init__(self, layers):

        self.activation = sigmoid
        self.activation_prime = sigmoid_prime

        # Set weights
        self.weights = []
        # layers = [2,2,1]
        # range of weight values (-1,1)
        # input and hidden layers - random((2+1, 2+1)) : 3 x 3
        for i in range(1, len(layers) - 1):
            r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1
            self.weights.append(r)
        # output layer - random((2+1, 1)) : 3 x 1
        r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1
        self.weights.append(r)

    def fit(self, X, y, learning_rate, epochs):
        # Add column of ones to X
        # This is to add the bias unit to the input layer
        ones = np.atleast_2d(np.ones(X.shape[0]))
        X = np.concatenate((ones.T, X), axis=1)

        for k in range(epochs):

            i = np.random.randint(X.shape[0])
            a = [X[i]]

            for l in range(len(self.weights)):
                    dot_value = np.dot(a[l], self.weights[l])
                    activation = self.activation(dot_value)
                    a.append(activation)
            # output layer
            error = y[i] - a[-1]
            deltas = [error * self.activation_prime(a[-1])]

            # we need to begin at the second to last layer
            # (a layer before the output layer)
            for l in range(len(a) - 2, 0, -1):
                deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))

            # reverse
            # [level3(output)->level2(hidden)]  => [level2(hidden)->level3(output)]
            deltas.reverse()

            # backpropagation
            # 1. Multiply its output delta and input activation
            #    to get the gradient of the weight.
            # 2. Subtract a ratio (percentage) of the gradient from the weight.
            for i in range(len(self.weights)):
                layer = np.atleast_2d(a[i])
                delta = np.atleast_2d(deltas[i])
                self.weights[i] += learning_rate * layer.T.dot(delta)

    def predict(self, x):
        a = np.concatenate((np.ones(1).T, np.array(x)))
        for l in range(0, len(self.weights)):
            a = self.activation(np.dot(a, self.weights[l]))
        return a

# Create neural net, 13 inputs, 20 hidden nodes, 3 outputs
nn = NeuralNetwork([13, 20, 3])
data = Process.readdata('wine')
# Split data out into input and output
X = data[0]
y = data[1]
# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()

# Split data into training and test sets (15% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)

# Create binay output form
y_ = LabelBinarizer().fit_transform(y_train)

# Train data
lrate = 0.01
epoch = 1000
nn.fit(X_train, y_, lrate, epoch)

# Test data
err = []
for e in X_test:
    # Create array of output data (argmax to get classification)
    err.append(np.argmax(nn.predict(e)))

# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.
warnings.filterwarnings('ignore')
# Produce confusion matrix and classification report
print(confusion_matrix(y_test, err))
print(classification_report(y_test, err))

# Plot actual and predicted data
plt.figure(figsize=(10, 8))
target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')
estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')
plt.legend(handles=[target, estimated])
plt.xlabel('# Samples')
plt.ylabel('Classification Value')
plt.grid()
plt.show()
import csv
import numpy as np


# Add constant column of 1's
def addones(arrayvar):
    return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))


def readdata(loc):
    # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'
    # operator in num_cols has already pasted over the first row.
    with open(loc + '.input.csv') as f:
        file = csv.reader(f, delimiter=',', skipinitialspace=True)
        num_cols = len(next(file))
        num_rows = len(list(file))+1

    # Create a zero'd array based on the number of column and rows previously found.
    x = np.zeros((num_rows, num_cols))
    y = np.zeros(num_rows)

    # INPUT #
    # Loop through the input file and put each row into a new row of 'samples'
    with open(loc + '.input.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            x[count] = row
            count += 1

    # OUTPUT #
    # Do the same and loop through the output file.
    with open(loc + '.output.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            y[count] = row[0]
            count += 1

    # Set data type
    x = np.array(x).astype(np.float)
    y = np.array(y).astype(np.int)

    return x, y
Process.py

import numpy as np
import Process
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelBinarizer
import warnings


def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))


def sigmoid_prime(x):
    return sigmoid(x)*(1.0-sigmoid(x))


class NeuralNetwork:

    def __init__(self, layers):

        self.activation = sigmoid
        self.activation_prime = sigmoid_prime

        # Set weights
        self.weights = []
        # layers = [2,2,1]
        # range of weight values (-1,1)
        # input and hidden layers - random((2+1, 2+1)) : 3 x 3
        for i in range(1, len(layers) - 1):
            r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1
            self.weights.append(r)
        # output layer - random((2+1, 1)) : 3 x 1
        r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1
        self.weights.append(r)

    def fit(self, X, y, learning_rate, epochs):
        # Add column of ones to X
        # This is to add the bias unit to the input layer
        ones = np.atleast_2d(np.ones(X.shape[0]))
        X = np.concatenate((ones.T, X), axis=1)

        for k in range(epochs):

            i = np.random.randint(X.shape[0])
            a = [X[i]]

            for l in range(len(self.weights)):
                    dot_value = np.dot(a[l], self.weights[l])
                    activation = self.activation(dot_value)
                    a.append(activation)
            # output layer
            error = y[i] - a[-1]
            deltas = [error * self.activation_prime(a[-1])]

            # we need to begin at the second to last layer
            # (a layer before the output layer)
            for l in range(len(a) - 2, 0, -1):
                deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))

            # reverse
            # [level3(output)->level2(hidden)]  => [level2(hidden)->level3(output)]
            deltas.reverse()

            # backpropagation
            # 1. Multiply its output delta and input activation
            #    to get the gradient of the weight.
            # 2. Subtract a ratio (percentage) of the gradient from the weight.
            for i in range(len(self.weights)):
                layer = np.atleast_2d(a[i])
                delta = np.atleast_2d(deltas[i])
                self.weights[i] += learning_rate * layer.T.dot(delta)

    def predict(self, x):
        a = np.concatenate((np.ones(1).T, np.array(x)))
        for l in range(0, len(self.weights)):
            a = self.activation(np.dot(a, self.weights[l]))
        return a

# Create neural net, 13 inputs, 20 hidden nodes, 3 outputs
nn = NeuralNetwork([13, 20, 3])
data = Process.readdata('wine')
# Split data out into input and output
X = data[0]
y = data[1]
# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()

# Split data into training and test sets (15% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)

# Create binay output form
y_ = LabelBinarizer().fit_transform(y_train)

# Train data
lrate = 0.01
epoch = 1000
nn.fit(X_train, y_, lrate, epoch)

# Test data
err = []
for e in X_test:
    # Create array of output data (argmax to get classification)
    err.append(np.argmax(nn.predict(e)))

# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.
warnings.filterwarnings('ignore')
# Produce confusion matrix and classification report
print(confusion_matrix(y_test, err))
print(classification_report(y_test, err))

# Plot actual and predicted data
plt.figure(figsize=(10, 8))
target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')
estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')
plt.legend(handles=[target, estimated])
plt.xlabel('# Samples')
plt.ylabel('Classification Value')
plt.grid()
plt.show()
import csv
import numpy as np


# Add constant column of 1's
def addones(arrayvar):
    return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))


def readdata(loc):
    # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'
    # operator in num_cols has already pasted over the first row.
    with open(loc + '.input.csv') as f:
        file = csv.reader(f, delimiter=',', skipinitialspace=True)
        num_cols = len(next(file))
        num_rows = len(list(file))+1

    # Create a zero'd array based on the number of column and rows previously found.
    x = np.zeros((num_rows, num_cols))
    y = np.zeros(num_rows)

    # INPUT #
    # Loop through the input file and put each row into a new row of 'samples'
    with open(loc + '.input.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            x[count] = row
            count += 1

    # OUTPUT #
    # Do the same and loop through the output file.
    with open(loc + '.output.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            y[count] = row[0]
            count += 1

    # Set data type
    x = np.array(x).astype(np.float)
    y = np.array(y).astype(np.int)

    return x, y
MATLAB脚本

%% LOAD DATA 
[x1,t1] = wine_dataset;

%% SET UP NN 
net = patternnet(20); 
net.trainFcn = 'traingd'; 
net.layers{2}.transferFcn = 'logsig'; 
net.derivFcn = 'logsig';

%% TRAIN AND TEST
[net,tr] = train(net,x1,t1);
可在此处下载数据文件:
我想你把
epoch
step
这两个词弄混了。如果你已经训练了一个
epoch
,它通常指的是已经运行了所有的数据

例如:如果您有10.000个样本,则您已将所有10.000个样本(不考虑样本的随机抽样)放入模型中,并每次执行一步(更新权重)

修复:长时间运行网络:

nn.fit(X_train, y_, lrate, epoch*len(X))
奖金:
MatLab的文档将历代转换为
(迭代)
,这是误导,但对它的评论基本上就是我上面写的。

我认为您混淆了术语
历代
步骤
。如果你已经训练了一个
epoch
,它通常指的是已经运行了所有的数据

例如:如果您有10.000个样本,则您已将所有10.000个样本(不考虑样本的随机抽样)放入模型中,并每次执行一步(更新权重)

修复:长时间运行网络:

nn.fit(X_train, y_, lrate, epoch*len(X))
奖金:
MatLab的文档将时代翻译成了具有误导性的
(迭代)
,但对它的评论基本上就是我在上面写的。

我相信我已经找到了问题所在。这是数据集本身(并非所有数据集都存在此问题)和我缩放数据的方式的组合。我最初的缩放方法(处理0到1之间的结果)对这种情况没有帮助,并导致了糟糕的结果:

# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()
我发现了另一种缩放方法,由sklearn预处理包提供:

from sklearn import preprocessing
X = preprocessing.scale(X)
这种缩放方法不在0和1之间,我有进一步的调查来确定为什么它有这么大的帮助,但现在结果以96%到100%的准确率回来了。和MATLAB结果非常接近,我认为这是使用了类似(或相同)的预处理缩放方法


如上所述,并非所有数据集都是如此。使用内置的sklearn虹膜或数字数据集似乎可以在不进行缩放的情况下产生良好的效果。

我相信我已经找到了问题所在。这是数据集本身(并非所有数据集都存在此问题)和我缩放数据的方式的组合。我最初的缩放方法(处理0到1之间的结果)对这种情况没有帮助,并导致了糟糕的结果:

# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()
我发现了另一种缩放方法,由sklearn预处理包提供:

from sklearn import preprocessing
X = preprocessing.scale(X)
这种缩放方法不在0和1之间,我有进一步的调查来确定为什么它有这么大的帮助,但现在结果以96%到100%的准确率回来了。和MATLAB结果非常接近,我认为这是使用了类似(或相同)的预处理缩放方法


如上所述,并非所有数据集都是如此。使用内置的sklearn iris或数字数据集似乎在不进行缩放的情况下产生了良好的效果。

也许这是您已经做过的(正如您提到的调试),但请看一下MATLAB培训函数的内部:
edit train.m
谢谢,Mikkola,但我确实已经看过train.m了。有什么具体的我应该找的吗?我注意到Matlab矢量化了它们的权重,而我的代码循环遍历每个权重层。如果方程是相同的(我相信它们是相同的),这应该会产生相同的结果。也许这是您已经做过的(正如您提到的调试),但是看看MATLAB培训函数:
edit train.m
谢谢,Mikkola,但我确实已经在train.m中看过了。有什么具体的我应该找的吗?我注意到Matlab矢量化了它们的权重,而我的代码循环遍历每个权重层。如果方程是相同的(我相信它们是相同的),这应该会产生相同的结果。谢谢你的输入,Aske,但这并没有解决问题。通过MATLAB进行调试后,我发现该代码只循环遍历整个数据集,循环次数为历元数(1000),与我的原始代码相同。这就是说,我已经实现了你的建议,正如想象中的那样,它改善了结果,但是,它仍然比MATLAB 98%的准确率低20%。我还运行了1000000 epoch的代码,这也没有多大帮助,并且表明单靠epoch number无法解决这个问题。在那垫子下面似乎有更多的东西