Neural network 训练神经网络进行加法运算
我需要训练一个网络乘以或加上2个输入,但对于20000后的所有点,它似乎都不太接近 迭代。更具体地说,我在整个数据集上训练它,它在最后几点上很接近,但看起来 第一个端点的情况似乎没有好转。我规范化数据,使其介于-0.8和0.8之间。这个 网络本身由2个输入、3个隐藏神经元和1个输出神经元组成。我还将网络的学习率设置为0.25, 并将其用作学习函数tanh(x) 对于数据集中最后训练的点,它非常接近,但对于第一个点,它似乎是这样 不能很好地近似。我想知道它是什么,它不能帮助它很好地调整,无论是我正在使用的拓扑,还是 还有别的吗Neural network 训练神经网络进行加法运算,neural-network,Neural Network,我需要训练一个网络乘以或加上2个输入,但对于20000后的所有点,它似乎都不太接近 迭代。更具体地说,我在整个数据集上训练它,它在最后几点上很接近,但看起来 第一个端点的情况似乎没有好转。我规范化数据,使其介于-0.8和0.8之间。这个 网络本身由2个输入、3个隐藏神经元和1个输出神经元组成。我还将网络的学习率设置为0.25, 并将其用作学习函数tanh(x) 对于数据集中最后训练的点,它非常接近,但对于第一个点,它似乎是这样 不能很好地近似。我想知道它是什么,它不能帮助它很好地调整,无论是我正
还有多少神经元适合这个网络的隐藏层?想想如果你将
tanh(x)
阈值函数替换为x的线性函数-称之为a.x
-并将a
作为每个神经元的唯一学习参数,会发生什么。这就是您的网络将有效优化的目标;这是tanh
函数过零的近似值
现在,当你对这种线性类型的神经元分层时会发生什么?当脉冲从输入到输出时,将每个神经元的输出相乘。你试图用一组乘法来近似加法。正如他们所说,这是无法计算的。一个由单个神经元组成的网络,其权重为{1,1},偏差为0,线性激活函数执行两个输入数的相加 乘法可能更难。以下是网络可以使用的两种方法:
b
的长度(对数)成比例的可变神经元数a*b=exp(ln(a)+ln(b))
这个网络可以处理任何长度的数字,只要它能够很好地逼近对数和指数如果你想保持事物的神经性(链接有权重,神经元根据权重计算输入的深思熟虑的总和,并根据总和的sigmoid值回答0或1,使用梯度的反向传播),那么你应该将隐层的神经元视为分类器。它们定义了一条线,将输入空间分成几个类:1类对应于神经元响应1的部分,另一类对应于神经元响应0的部分。隐藏层的第二个神经元将定义另一个分离,以此类推。输出神经元通过调整其输出权值,使其与学习过程中呈现的权值相对应,从而组合隐藏层的输出。
因此,单个神经元将输入空间分为两类(可能对应于一个加法,具体取决于学习数据库)。两个神经元可以定义4类。三个神经元8类等。将隐藏神经元的输出视为2的幂:
h1*2^0+h2*2^1+…+hn*2^n
,其中hi
是隐藏神经元的输出i
。注意:你需要n个输出神经元。这就回答了关于要使用的隐藏神经元数量的问题。但是神经网络不计算加法。它认为这是一个基于所学知识的分类问题。它将永远无法为其学习基础之外的价值观生成正确答案。在学习阶段,它调整权重,以便放置分隔符(2D中的线条),从而生成正确答案。如果您的输入在
[0,10]
中,它将学会为[0,10]^2
中的值添加生成正确答案,但永远不会为12+11
给出正确答案如果您的最后一个值已被很好地学习,而第一个值已被遗忘,请尝试降低学习率:最后一个示例的权重修改(取决于梯度)可能会覆盖第一个值(如果您使用的是随机backprop)。确保你的学习基础是公平的。你也可以更经常地展示那些学得不好的例子。并尝试多种学习率值,直到找到一个好值。可能为时已晚,但一个简单的解决方案是使用RNN() 将数字转换为数字后,NN将从从左到右的数字序列中提取两个数字 RNN必须循环其一个输出,以便它能够自动理解有一个数字要进位(如果总和为2,则写入0和进位1) 为了训练它,你需要给它两个数字组成的输入(一个来自第一个数字,第二个来自第二个数字)和所需的输出。RNN最终将找到如何进行求和 请注意,此RNN只需要知道以下8种情况,就可以了解如何对两个数字求和:
- 带进位的1+1、0+0、1+0、0+1
- 不带进位的1+1、0+0、1+0、0+1
from __future__ import print_function
from keras.models import Sequential
from keras import layers
import numpy as np
from six.moves import range
class CharacterTable(object):
"""Given a set of characters:
+ Encode them to a one hot integer representation
+ Decode the one hot integer representation to their character output
+ Decode a vector of probabilities to their character output
"""
def __init__(self, chars):
"""Initialize character table.
# Arguments
chars: Characters that can appear in the input.
"""
self.chars = sorted(set(chars))
self.char_indices = dict((c, i) for i, c in enumerate(self.chars))
self.indices_char = dict((i, c) for i, c in enumerate(self.chars))
def encode(self, C, num_rows):
"""One hot encode given string C.
# Arguments
num_rows: Number of rows in the returned one hot encoding. This is
used to keep the # of rows for each data the same.
"""
x = np.zeros((num_rows, len(self.chars)))
for i, c in enumerate(C):
x[i, self.char_indices[c]] = 1
return x
def decode(self, x, calc_argmax=True):
if calc_argmax:
x = x.argmax(axis=-1)
return ''.join(self.indices_char[x] for x in x)
class colors:
ok = '\033[92m'
fail = '\033[91m'
close = '\033[0m'
# Parameters for the model and dataset.
TRAINING_SIZE = 50000
DIGITS = 3
INVERT = True
# Maximum length of input is 'int + int' (e.g., '345+678'). Maximum length of
# int is DIGITS.
MAXLEN = DIGITS + 1 + DIGITS
# All the numbers, plus sign and space for padding.
chars = '0123456789+ '
ctable = CharacterTable(chars)
questions = []
expected = []
seen = set()
print('Generating data...')
while len(questions) < TRAINING_SIZE:
f = lambda: int(''.join(np.random.choice(list('0123456789'))
for i in range(np.random.randint(1, DIGITS + 1))))
a, b = f(), f()
# Skip any addition questions we've already seen
# Also skip any such that x+Y == Y+x (hence the sorting).
key = tuple(sorted((a, b)))
if key in seen:
continue
seen.add(key)
# Pad the data with spaces such that it is always MAXLEN.
q = '{}+{}'.format(a, b)
query = q + ' ' * (MAXLEN - len(q))
ans = str(a + b)
# Answers can be of maximum size DIGITS + 1.
ans += ' ' * (DIGITS + 1 - len(ans))
if INVERT:
# Reverse the query, e.g., '12+345 ' becomes ' 543+21'. (Note the
# space used for padding.)
query = query[::-1]
questions.append(query)
expected.append(ans)
print('Total addition questions:', len(questions))
print('Vectorization...')
x = np.zeros((len(questions), MAXLEN, len(chars)), dtype=np.bool)
y = np.zeros((len(questions), DIGITS + 1, len(chars)), dtype=np.bool)
for i, sentence in enumerate(questions):
x[i] = ctable.encode(sentence, MAXLEN)
for i, sentence in enumerate(expected):
y[i] = ctable.encode(sentence, DIGITS + 1)
# Shuffle (x, y) in unison as the later parts of x will almost all be larger
# digits.
indices = np.arange(len(y))
np.random.shuffle(indices)
x = x[indices]
y = y[indices]
# Explicitly set apart 10% for validation data that we never train over.
split_at = len(x) - len(x) // 10
(x_train, x_val) = x[:split_at], x[split_at:]
(y_train, y_val) = y[:split_at], y[split_at:]
print('Training Data:')
print(x_train.shape)
print(y_train.shape)
print('Validation Data:')
print(x_val.shape)
print(y_val.shape)
# Try replacing GRU, or SimpleRNN.
RNN = layers.LSTM
HIDDEN_SIZE = 128
BATCH_SIZE = 128
LAYERS = 1
print('Build model...')
model = Sequential()
# "Encode" the input sequence using an RNN, producing an output of HIDDEN_SIZE.
# Note: In a situation where your input sequences have a variable length,
# use input_shape=(None, num_feature).
model.add(RNN(HIDDEN_SIZE, input_shape=(MAXLEN, len(chars))))
# As the decoder RNN's input, repeatedly provide with the last hidden state of
# RNN for each time step. Repeat 'DIGITS + 1' times as that's the maximum
# length of output, e.g., when DIGITS=3, max output is 999+999=1998.
model.add(layers.RepeatVector(DIGITS + 1))
# The decoder RNN could be multiple layers stacked or a single layer.
for _ in range(LAYERS):
# By setting return_sequences to True, return not only the last output but
# all the outputs so far in the form of (num_samples, timesteps,
# output_dim). This is necessary as TimeDistributed in the below expects
# the first dimension to be the timesteps.
model.add(RNN(HIDDEN_SIZE, return_sequences=True))
# Apply a dense layer to the every temporal slice of an input. For each of step
# of the output sequence, decide which character should be chosen.
model.add(layers.TimeDistributed(layers.Dense(len(chars))))
model.add(layers.Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
model.summary()
# Train the model each generation and show predictions against the validation
# dataset.
for iteration in range(1, 200):
print()
print('-' * 50)
print('Iteration', iteration)
model.fit(x_train, y_train,
batch_size=BATCH_SIZE,
epochs=1,
validation_data=(x_val, y_val))
# Select 10 samples from the validation set at random so we can visualize
# errors.
for i in range(10):
ind = np.random.randint(0, len(x_val))
rowx, rowy = x_val[np.array([ind])], y_val[np.array([ind])]
preds = model.predict_classes(rowx, verbose=0)
q = ctable.decode(rowx[0])
correct = ctable.decode(rowy[0])
guess = ctable.decode(preds[0], calc_argmax=False)
print('Q', q[::-1] if INVERT else q, end=' ')
print('T', correct, end=' ')
if correct == guess:
print(colors.ok + '☑' + colors.close, end=' ')
else:
print(colors.fail + '☒' + colors.close, end=' ')
print(guess)
from\uuuuu future\uuuuu导入打印功能
从keras.models导入顺序
从keras导入图层
将numpy作为np导入
从六点开始,移动输入范围
类CharacterTable(对象):
“”“给定一组字符:
+将它们编码为一个热整数表示