Python Caffe:Can';t似乎没有学习y=x^2函数
我试图训练一个神经网络来学习深度学习框架Caffe中的函数y=x^2。这是我的密码: 数据生成代码:Python Caffe:Can';t似乎没有学习y=x^2函数,python,neural-network,deep-learning,caffe,lmdb,Python,Neural Network,Deep Learning,Caffe,Lmdb,我试图训练一个神经网络来学习深度学习框架Caffe中的函数y=x^2。这是我的密码: 数据生成代码: import numpy as np import lmdb import caffe Ntrain = 100 Ntest = 20 K = 1 H = 1 W = 1 Xtrain = np.uint8(np.random.randint(0, 256, size=(Ntrain,K,H,W))) Xtest = np.uint8(np.random.randint(0, 256, si
import numpy as np
import lmdb
import caffe
Ntrain = 100
Ntest = 20
K = 1
H = 1
W = 1
Xtrain = np.uint8(np.random.randint(0, 256, size=(Ntrain,K,H,W)))
Xtest = np.uint8(np.random.randint(0, 256, size=(Ntest,K,H,W)))
ytrain = np.zeros(Ntrain, dtype=np.int32)
ytest = np.zeros(Ntest, dtype=np.int32)
for i in range(Xtrain.shape[0]):
ytrain[i] = int(Xtrain[i,0,0,0]) * int(Xtrain[i,0,0,0])
for i in range(Xtest.shape[0]):
ytest[i] = int(Xtest[i,0,0,0]) * int(Xtest[i,0,0,0])
env = lmdb.open('expt/expt_train')
for i in range(Ntrain):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtrain.shape[1]
datum.height = Xtrain.shape[2]
datum.width = Xtrain.shape[3]
datum.data = Xtrain[i].tobytes()
datum.label = int(ytrain[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
env = lmdb.open('expt/expt_test')
for i in range(Ntest):
datum = caffe.proto.caffe_pb2.Datum()
datum.channels = Xtest.shape[1]
datum.height = Xtest.shape[2]
datum.width = Xtest.shape[3]
datum.data = Xtest[i].tobytes()
datum.label = int(ytest[i])
str_id = '{:08}'.format(i)
with env.begin(write=True) as txn:
txn.put(str_id.encode('ascii'), datum.SerializeToString())
解算器文件:
net: "expt/expt.prototxt"
max_iter: 500
test_iter: 20
test_interval: 100
display: 100
base_lr: 0.001
momentum: 0.9
lr_policy: "inv"
snapshot_prefix: "expt/expt"
snapshot_diff: true
solver_mode: CPU
solver_type: SGD
debug_info: true
Caffe模型:
name: "expt"
layer {
name: "Expt_Data_Train"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
data_param {
source: "expt/expt_train"
backend: LMDB
batch_size: 1
}
}
layer {
name: "Expt_Data_Validate"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
data_param {
source: "expt/expt_test"
backend: LMDB
batch_size: 1
}
}
layer {
name: "IP1"
type: "InnerProduct"
bottom: "data"
top: "ip1"
inner_product_param {
num_output: 2
}
}
layer {
name: "IP2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
inner_product_param {
num_output: 1
}
}
layer {
name: "Loss"
type: "EuclideanLoss"
bottom: "ip2"
bottom: "label"
top: "loss"
}
我得到了一个10^8的误差,这是难以置信的。网络应该只接受一个输入并产生一个输出。输入为[0255]范围内的整数,输出为各输入的平方。知道为什么会产生如此大的误差吗?当权重都由零初始化时,它们不会得到更新。尝试将weight_filler{type:“gaussian”#或“xavier”}添加到两个IP层的内部_product_param子句中。您还需要降低学习速率(例如,base_lr:1e-9)并向解算器添加正则化(例如,权重衰减:0.0005),否则将遇到inf权重和nan输出。这并不能真正解决高损耗问题,但有迹象表明它正在学习近似函数。2个隐藏节点可能太少,无法近似X^2。如果权重都由零初始化,则不会更新权重。尝试将weight_filler{type:“gaussian”#或“xavier”}添加到两个IP层的内部_product_param子句中。您还需要降低学习速率(例如,base_lr:1e-9)并向解算器添加正则化(例如,权重衰减:0.0005),否则将遇到inf权重和nan输出。这并不能真正解决高损耗问题,但有迹象表明它正在学习近似函数。2个隐藏节点可能太少,无法逼近X^2。