Python 小批量梯度下降梯度在几个时代后爆炸

Python 小批量梯度下降梯度在几个时代后爆炸,python,tensorflow,neural-network,mini-batch,Python,Tensorflow,Neural Network,Mini Batch,我训练了一个小批量梯度下降模型,以收敛于0.00016左右的直接解rmse。有效数据集(函数中的RMSE_valid_数组)的RMSE输出在第一个历元时很好,但在几个历元后,它开始爆炸,我为此奋斗了几天,算法似乎很好,问题出在哪里 附言。 X_列的形状为(11000,41),y_列的形状为(11000,1),此处的批量大小为1,学习率为0.001。 我将重量初始化为非常小(除以1000)。我已经检查过X_mini和y_mini是否正常,graident在几个时代后开始爆炸 P>>Andrew

我训练了一个小批量梯度下降模型,以收敛于0.00016左右的直接解rmse。有效数据集(函数中的RMSE_valid_数组)的RMSE输出在第一个历元时很好,但在几个历元后,它开始爆炸,我为此奋斗了几天,算法似乎很好,问题出在哪里

附言。 X_列的形状为(11000,41),y_列的形状为(11000,1),此处的批量大小为1,学习率为0.001。 我将重量初始化为非常小(除以1000)。我已经检查过X_mini和y_mini是否正常,graident在几个时代后开始爆炸

<> P>>Andrew Ng>1/LeN(y)< /代码>(每个批次的大小)到>1/m < /代码>(整个训练集的大小),RMSE每一个时期都变小,但不是在他的迷你批次讲课中提到的趋势。

[0.003352938483114684,
 0.014898628026733278,
 0.015708125817549583,
 0.15904084037991562,
 0.9772361042313762,
 17.776216375980052,
 187.04333942512542,
 978.648663972064,
 17383.631549616875,
 103997.59758713894,
 2222088.2561604036,
 23334640.70860544,
 118182306.23839562,
 2606049599.35717,
 18920677325.736164,
 261342486636.4693,
 1738434547629.957,
 10577420781634.316,
 164217272049684.75,
 1131726496072944.8,
 1.6219370161174172e+16,
 2.4623815536311107e+17,
...
下面是继续进行小批量的主要功能

def mini_batch_GD(X_train, X_valid, y_train, y_valid, batch_size, lr, CT):
  
  m = len(y_train)
  n = X_train.shape[1]
  # initialize weight
  w = (np.random.random(n)).reshape(1, -1)/1000
  
  rmse_train_array = []
  rmse_valid_array = []
  time_epoch = []
  
  for epoch in range(0, 100):
    start_time = time.time()

    # shuffle batches
    mini_batches = create_minibatches(X_train, y_train, batch_size)
    for mini_batch in mini_batches:
      X_mini, y_mini = mini_batch
      y_pred = np.dot(X_mini, w.T).reshape(-1, 1)
      # t = np.array(y_mini).reshape(-1, 1)
      gradient = (1/len(y_pred) * np.dot(X_mini.T, y_pred - y_mini)).reshape(1, -1)
      w = w - lr * gradient
    
    # training rmse
    y_pred_train = np.dot(X_train, w.T).reshape(-1, 1)
    rmse_train_array.append(rmse(y_pred_train, y_train))

    # valid rmse
    y_pred_valid = np.dot(X_valid, w.T).reshape(-1, 1)
    rmse_valid_array.append(rmse(y_pred_valid, y_valid))

    # time for each epoch
    time_epoch.append(time.time() - start_time)

    # check for convergence
    if rmse(y_pred_valid, y_valid) <= CT:
      break
    
  return w, rmse_train_array, rmse_valid_array, time_epoch
def create_minibatches(X, y, batch_size):
  data = np.hstack((X, y))
  np.random.shuffle(data)
  n_minibatches = data.shape[0]
  i = 0
  batch_size = 2
  mini_batches = []

  for i in range(n_minibatches // batch_size):
    mini_batch = data[i * batch_size:(i + 1) * batch_size, :]
    X_mini = mini_batch[:, :-1]
    y_mini = mini_batch[:, -1].reshape((-1,1))
    mini_batches.append((X_mini, y_mini))
  if data.shape[0] % batch_size != 0:
    mini_batch = data[i * batch_size : data.shape[0]]
    X_mini = mini_batch[:, :-1]
    y_mini = mini_batch[:, -1].reshape((-1, 1))
    mini_batches.append((X_mini, y_mini))
  return mini_batches

def rmse(yPred, y):
  return np.sqrt(mean_squared_error(yPred, y))