Neural network 训练神经网络进行加法运算_Neural Network

Neural network 训练神经网络进行加法运算

neural-network

Neural network 训练神经网络进行加法运算,neural-network,Neural Network,我需要训练一个网络乘以或加上2个输入，但对于20000后的所有点，它似乎都不太接近迭代。更具体地说，我在整个数据集上训练它，它在最后几点上很接近，但看起来第一个端点的情况似乎没有好转。我规范化数据，使其介于-0.8和0.8之间。这个网络本身由2个输入、3个隐藏神经元和1个输出神经元组成。我还将网络的学习率设置为0.25，并将其用作学习函数tanh（x）对于数据集中最后训练的点，它非常接近，但对于第一个点，它似乎是这样不能很好地近似。我想知道它是什么，它不能帮助它很好地调整，无论是我正

我需要训练一个网络乘以或加上2个输入，但对于20000后的所有点，它似乎都不太接近迭代。更具体地说，我在整个数据集上训练它，它在最后几点上很接近，但看起来第一个端点的情况似乎没有好转。我规范化数据，使其介于-0.8和0.8之间。这个网络本身由2个输入、3个隐藏神经元和1个输出神经元组成。我还将网络的学习率设置为0.25，并将其用作学习函数tanh（x）

对于数据集中最后训练的点，它非常接近，但对于第一个点，它似乎是这样不能很好地近似。我想知道它是什么，它不能帮助它很好地调整，无论是我正在使用的拓扑，还是还有别的吗

还有多少神经元适合这个网络的隐藏层？

想想如果你将

tanh（x）

阈值函数替换为x的线性函数-称之为

a.x

-并将

作为每个神经元的唯一学习参数，会发生什么。这就是您的网络将有效优化的目标；这是

tanh

函数过零的近似值

现在，当你对这种线性类型的神经元分层时会发生什么？当脉冲从输入到输出时，将每个神经元的输出相乘。你试图用一组乘法来近似加法。正如他们所说，这是无法计算的。

一个由单个神经元组成的网络，其权重为{1,1}，偏差为0，线性激活函数执行两个输入数的相加

乘法可能更难。以下是网络可以使用的两种方法：

将其中一个数字转换为数字（例如，二进制），并像小学时那样执行乘法<代码>a*b=a*（b0*2^0+b1*2^1+…+bk*2^k）=a*b0*2^0+a*b1*2^1+…+a*bk*2^k。这种方法很简单，但需要与输入

的长度（对数）成比例的可变神经元数

取输入的对数，将它们相加，然后对结果进行幂运算

a*b=exp（ln（a）+ln（b））

这个网络可以处理任何长度的数字，只要它能够很好地逼近对数和指数

如果你想保持事物的神经性（链接有权重，神经元根据权重计算输入的深思熟虑的总和，并根据总和的sigmoid值回答0或1，使用梯度的反向传播），那么你应该将隐层的神经元视为分类器。它们定义了一条线，将输入空间分成几个类：1类对应于神经元响应1的部分，另一类对应于神经元响应0的部分。隐藏层的第二个神经元将定义另一个分离，以此类推。输出神经元通过调整其输出权值，使其与学习过程中呈现的权值相对应，从而组合隐藏层的输出。
因此，单个神经元将输入空间分为两类（可能对应于一个加法，具体取决于学习数据库）。两个神经元可以定义4类。三个神经元8类等。将隐藏神经元的输出视为2的幂：
h1*2^0+h2*2^1+…+hn*2^n
，其中
hi
是隐藏神经元的输出
i
。注意：你需要n个输出神经元。这就回答了关于要使用的隐藏神经元数量的问题。
但是神经网络不计算加法。它认为这是一个基于所学知识的分类问题。它将永远无法为其学习基础之外的价值观生成正确答案。在学习阶段，它调整权重，以便放置分隔符（2D中的线条），从而生成正确答案。如果您的输入在
[0,10]
中，它将学会为
[0,10]^2
中的值添加生成正确答案，但永远不会为
12+11
给出正确答案

如果您的最后一个值已被很好地学习，而第一个值已被遗忘，请尝试降低学习率：最后一个示例的权重修改（取决于梯度）可能会覆盖第一个值（如果您使用的是随机backprop）。确保你的学习基础是公平的。你也可以更经常地展示那些学得不好的例子。并尝试多种学习率值，直到找到一个好值。
可能为时已晚，但一个简单的解决方案是使用RNN（）

将数字转换为数字后，NN将从从左到右的数字序列中提取两个数字
RNN必须循环其一个输出，以便它能够自动理解有一个数字要进位（如果总和为2，则写入0和进位1）
为了训练它，你需要给它两个数字组成的输入（一个来自第一个数字，第二个来自第二个数字）和所需的输出。RNN最终将找到如何进行求和
请注意，此RNN只需要知道以下8种情况，就可以了解如何对两个数字求和：

带进位的1+1、0+0、1+0、0+1

不带进位的1+1、0+0、1+0、0+1

我也试着这么做。训练2、3、4位加法，准确率达97%。你可以用一种神经网络来实现

keras的Juypter Notebook示例程序可从以下链接获得：

希望能有帮助
在此附上代码以供参考

from __future__ import print_function from keras.models import Sequential from keras import layers import numpy as np from six.moves import range class CharacterTable(object): """Given a set of characters: + Encode them to a one hot integer representation + Decode the one hot integer representation to their character output + Decode a vector of probabilities to their character output """ def __init__(self, chars): """Initialize character table. # Arguments chars: Characters that can appear in the input. """ self.chars = sorted(set(chars)) self.char_indices = dict((c, i) for i, c in enumerate(self.chars)) self.indices_char = dict((i, c) for i, c in enumerate(self.chars)) def encode(self, C, num_rows): """One hot encode given string C. # Arguments num_rows: Number of rows in the returned one hot encoding. This is used to keep the # of rows for each data the same. """ x = np.zeros((num_rows, len(self.chars))) for i, c in enumerate(C): x[i, self.char_indices[c]] = 1 return x def decode(self, x, calc_argmax=True): if calc_argmax: x = x.argmax(axis=-1) return ''.join(self.indices_char[x] for x in x) class colors: ok = '\033[92m' fail = '\033[91m' close = '\033[0m' # Parameters for the model and dataset. TRAINING_SIZE = 50000 DIGITS = 3 INVERT = True # Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of # int is DIGITS. MAXLEN = DIGITS + 1 + DIGITS # All the numbers, plus sign and space for padding. chars = '0123456789+ ' ctable = CharacterTable(chars) questions = [] expected = [] seen = set() print('Generating data...') while len(questions) < TRAINING_SIZE: f = lambda: int(''.join(np.random.choice(list('0123456789')) for i in range(np.random.randint(1, DIGITS + 1)))) a, b = f(), f() # Skip any addition questions we've already seen # Also skip any such that x+Y == Y+x (hence the sorting). key = tuple(sorted((a, b))) if key in seen: continue seen.add(key) # Pad the data with spaces such that it is always MAXLEN. q = '{}+{}'.format(a, b) query = q + ' ' * (MAXLEN - len(q)) ans = str(a + b) # Answers can be of maximum size DIGITS + 1. ans += ' ' * (DIGITS + 1 - len(ans)) if INVERT: # Reverse the query, e.g., '12+345 ' becomes ' 543+21'. (Note the # space used for padding.) query = query[::-1] questions.append(query) expected.append(ans) print('Total addition questions:', len(questions)) print('Vectorization...') x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool) y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool) for i, sentence in enumerate(questions): x[i] = ctable.encode(sentence, MAXLEN) for i, sentence in enumerate(expected): y[i] = ctable.encode(sentence, DIGITS + 1) # Shuffle (x, y) in unison as the later parts of x will almost all be larger # digits. indices = np.arange(len(y)) np.random.shuffle(indices) x = x[indices] y = y[indices] # Explicitly set apart 10% for validation data that we never train over. split_at = len(x) - len(x) // 10 (x_train, x_val) = x[:split_at], x[split_at:] (y_train, y_val) = y[:split_at], y[split_at:] print('Training Data:') print(x_train.shape) print(y_train.shape) print('Validation Data:') print(x_val.shape) print(y_val.shape) # Try replacing GRU, or SimpleRNN. RNN = layers.LSTM HIDDEN_SIZE = 128 BATCH_SIZE = 128 LAYERS = 1 print('Build model...') model = Sequential() # "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE. # Note: In a situation where your input sequences have a variable length, # use input_shape=(None, num_feature). model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars)))) # As the decoder RNN's input, repeatedly provide with the last hidden state of # RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum # length of output, e.g., when DIGITS=3, max output is 999+999=1998. model.add(layers.RepeatVector(DIGITS + 1)) # The decoder RNN could be multiple layers stacked or a single layer. for _ in range(LAYERS): # By setting return_sequences to True, return not only the last output but # all the outputs so far in the form of (num_samples, timesteps, # output_dim). This is necessary as TimeDistributed in the below expects # the first dimension to be the timesteps. model.add(RNN(HIDDEN_SIZE, return_sequences=True)) # Apply a dense layer to the every temporal slice of an input. For each of step # of the output sequence, decide which character should be chosen. model.add(layers.TimeDistributed(layers.Dense(len(chars)))) model.add(layers.Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.summary() # Train the model each generation and show predictions against the validation # dataset. for iteration in range(1, 200): print() print('-' * 50) print('Iteration', iteration) model.fit(x_train, y_train, batch_size=BATCH_SIZE, epochs=1, validation_data=(x_val, y_val)) # Select 10 samples from the validation set at random so we can visualize # errors. for i in range(10): ind = np.random.randint(0, len(x_val)) rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])] preds = model.predict_classes(rowx, verbose=0) q = ctable.decode(rowx[0]) correct = ctable.decode(rowy[0]) guess = ctable.decode(preds[0], calc_argmax=False) print('Q', q[::-1] if INVERT else q, end=' ') print('T', correct, end=' ') if correct == guess: print(colors.ok + '☑' + colors.close, end=' ') else: print(colors.fail + '☒' + colors.close, end=' ') print(guess)

from\uuuuu future\uuuuu导入打印功能从keras.models导入顺序从keras导入图层将numpy作为np导入从六点开始，移动输入范围类CharacterTable（对象）： “”“给定一组字符： +将它们编码为一个热整数表示