Python NLP的Tensorflow CNN不会收敛_Python_Tensorflow_Nlp_Tensorflow2.0_Sentiment Analysis

Python NLP的Tensorflow CNN不会收敛

python tensorflow nlp

Python NLP的Tensorflow CNN不会收敛,python,tensorflow,nlp,tensorflow2.0,sentiment-analysis,Python,Tensorflow,Nlp,Tensorflow2.0,Sentiment Analysis,我试图根据Yoon Kim在本文中提出的句子分类模型创建一个神经网络。我在TensorFlow Keras中构建了它，并使用填充句，将单词柠檬化作为输入，将三个类别积极、中立或消极作为输出下面是我建立的模型： def create_CNN_model(window_sizes, feature_maps, sent_size, num_categs, embedding_matrix:np.array): inputs = Input(shape=(sent_size), dtype='f

我试图根据Yoon Kim在本文中提出的句子分类模型创建一个神经网络。我在TensorFlow Keras中构建了它，并使用填充句，将单词柠檬化作为输入，将三个类别积极、中立或消极作为输出

下面是我建立的模型：

def create_CNN_model(window_sizes, feature_maps, sent_size, num_categs, embedding_matrix:np.array):
  inputs = Input(shape=(sent_size), dtype='float32', name='text_inputs') # dim = (BATCH_SIZE, sent_size, embedding_dim)

  # initialize the embeddings with my own embeddings matrix
  embed = Embedding(embedding_matrix.shape[0], embedding_matrix.shape[1], 
                  mask_zero=True, input_length=sent_size, 
                  weights=[embedding_matrix])(inputs)

  #create array for max pooled vectors of features 
  ta = []

  # as we have multiple window sizes:
  for n_window in window_sizes:
    con = Conv1D(feature_maps, n_window, padding='causal', 
                 activation="relu", use_bias=True)(embed) # (BATCH_SIZE, sent_size-window_size+1, feature_maps)
    # the convoluted tensor contains, for each window a feature map of dimension feature_maps
    pooled = GlobalMaxPool1D(data_format='channels_last')(con) # (BATCH_SIZE, sent_size-windows_size+1)
    # then, the max pooling operation extracts the maximum of each feature map, reducing the rank of the tensor
    # the max pooled tensor contains a feature for each window
    ta.append(pooled)

  concat = concatenate(ta, axis=1)
  dropped = Dropout(0.5)(concat)
  outputs = Dense(num_categs,activation="softmax",use_bias=True, kernel_regularizer=l2(l=3),
               kernel_constraint=Dropout(0.5))(dropped)

  # create the model
  model = Model(inputs=[inputs], outputs=[outputs])

  #return the model
  return model

我试着用200句话来训练这个模型，只是想看看它是否超出了数据。但不是过度拟合，损失值只是在0和1之间上下波动。我曾尝试将学习速率更改为1e-8这样的值，但没有效果

以下是我用于培训的功能：

def train_model(X_data, y_data, batch_sz, tf_model, max_patience, num_epochs, ln_rate):
# Instantiate an optimizer to train the model.
# optimizer = Adadelta(learning_rate=1e-3)
optimizer = Adam(learning_rate=ln_rate)

# Instantiate a loss function.
loss_fn = CategoricalCrossentropy()

# Prepare the metrics
train_acc_metric = CategoricalAccuracy()
val_acc_metric = CategoricalAccuracy()

buffer_sz = len(X_data)
patience = 0
epochs = num_epochs
last_val_acc = 0

# Start random state for better reprodutibility
np.random.seed(123)

# Create the checkpoints
ckpt = train.Checkpoint(step=tf.Variable(1), optimizer=optimizer, 
                 model=tf_model)
manager = train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)

# Create directory to save the trained model
path = "./saved_model"

print("\n----------------------------------------------")
if not os.path.isdir(path):
    try:
        os.mkdir(path)
    except OSError:
        print ("\nCreation of the directory %s failed \n" % path)
    else:
        print ("\nSuccessfully created the directory %s \n" % path)
else:
    print("\nDirectory %s already exists" % path)

print("\n----------------------------------------------")    
print("\nStarting run script...\n",
      "Model will be saved to ", path,"\n",
      "Checkpoints will be restored from and saved to .\tf_ckpts")

# Save model prior to training
tf_model.save("./saved_model/tf_model")

# Restart from last checkpoint, if available
ckpt.restore(manager.latest_checkpoint)
print("\n----------------------------------------------")
if manager.latest_checkpoint:
    print("\nRestored from {}".format(manager.latest_checkpoint))
else:
    print("\nInitializing from scratch.")

# beggining training loop
for epoch in range(epochs):
    print("\n----------------------------------------------")
    print('Start of epoch %d' % (epoch,))

    # re-shuffle data before each epoch
    np.random.shuffle(X_data)
    np.random.shuffle(y_data)

    # create the training dataset with 10-fold crossvalidation
    train_dataset = make_dataset(X_data,y_data,10)

    # Iterate over the batches of the dataset.
    for x_train, y_train, x_val, y_val in train_dataset:
        train_batches = tf.data.Dataset.from_tensor_slices((x_train, y_train))
        train_batches = train_batches.batch(batch_sz)

        for x_batch_train, y_batch_train in train_batches:
            with tf.GradientTape() as tape:
                # calculate the forward run
                logits = tf_model(x_batch_train)

                # assert if output and true label tensor shapes are equal
                get_shape = y_batch_train.shape
                tf.debugging.assert_shapes([
                (logits,get_shape),
                ], data=(y_batch_train, logits),
                summarize=3, message="Inconsistent shape (labels,output): ",
                name="assert_shapes")

                # calculate loss function
                loss_value = loss_fn(y_batch_train, logits)

                # add 1 step to the stpes variable
                ckpt.step.assign_add(1)

                # Add extra losses created during this forward pass:
                loss_value += sum(tf_model.losses)

            # calculate gradients
            grads = tape.gradient(loss_value, tf_model.trainable_weights)

            # backpropagate the gradients
            optimizer.apply_gradients(zip(grads, tf_model.trainable_weights))

            # Update training metric.
            train_acc_metric(y_batch_train, logits)

            # Save & log every 500 batches.
            if int(ckpt.step) % 100 == 0:
                save_path = manager.save()
                print("Saved checkpoint for step {}: {}".format(int(ckpt.step), save_path))
                print("loss {:1.2f}".format(loss_value))
                print('Seen so far: %s samples' % (int(ckpt.step) * batch_sz))

        # Run a cross-validation loop on each 10-fold dataset
        val_logits = tf_model(x_val)
        # Update val metrics
        val_acc_metric(y_val, val_logits)

    # Display metrics at the end of each epoch.
    train_acc = train_acc_metric.result()

    print('Training accuracy: ', float(train_acc))
    # Reset training metrics at the end of each epoch
    train_acc_metric.reset_states()

    print("----------")

    val_acc = val_acc_metric.result()
    print('Validation accuracy: ', float(val_acc))
    print("----------------------------------------------\n")
    val_acc_metric.reset_states()

    # Early stopping part
    if val_acc < last_val_acc:
        # If the max_patience is exceeded stop the training
        if patience >= max_patience:
            print("\n------------------------------------------------")
            print("Early stopping training to prevent over-fitting!")
            print("------------------------------------------------\n")
            break
        else:
            patience += 1

    # update the validation accuracy 
    last_val_acc = val_acc

# save the trained model
tf_model.save("./saved_model/tf_model")

print("\n------------------------------------------------")
print("\nEnd of Training!\n")

----------------------------------------------

Successfully created the directory ./saved_model 


----------------------------------------------

Starting run script...
 Model will be saved to  ./saved_model 
 Checkpoints will be restored from and saved to .   f_ckpts
INFO:tensorflow:Assets written to: ./saved_model/tf_model/assets

----------------------------------------------

Initializing from scratch.

----------------------------------------------
Start of epoch 0
Training accuracy:  0.38999998569488525
----------
Validation accuracy:  0.38999998569488525
----------------------------------------------


----------------------------------------------
Start of epoch 1
Saved checkpoint for step 100: ./tf_ckpts/ckpt-1
loss 1.05
Seen so far: 2000 samples
Training accuracy:  0.4050000011920929
----------
Validation accuracy:  0.4050000011920929
----------------------------------------------


----------------------------------------------
Start of epoch 2
Saved checkpoint for step 200: ./tf_ckpts/ckpt-2
loss 1.10
Seen so far: 4000 samples
Training accuracy:  0.36000001430511475
----------
Validation accuracy:  0.36000001430511475
----------------------------------------------


----------------------------------------------
Start of epoch 3
Saved checkpoint for step 300: ./tf_ckpts/ckpt-3
loss 1.15
Seen so far: 6000 samples
Training accuracy:  0.375
----------
Validation accuracy:  0.375
----------------------------------------------


----------------------------------------------
Start of epoch 4
Saved checkpoint for step 400: ./tf_ckpts/ckpt-4
loss 1.17
Seen so far: 8000 samples
Training accuracy:  0.38999998569488525
----------
Validation accuracy:  0.38999998569488525
----------------------------------------------


----------------------------------------------
Start of epoch 5
Saved checkpoint for step 500: ./tf_ckpts/ckpt-5
loss 1.18
Seen so far: 10000 samples
Training accuracy:  0.3799999952316284
----------
Validation accuracy:  0.3799999952316284
----------------------------------------------


----------------------------------------------
Start of epoch 6
Saved checkpoint for step 600: ./tf_ckpts/ckpt-6
loss 1.09
Seen so far: 12000 samples
Training accuracy:  0.35499998927116394
----------
Validation accuracy:  0.35499998927116394
----------------------------------------------


----------------------------------------------
Start of epoch 7
Saved checkpoint for step 700: ./tf_ckpts/ckpt-7
loss 1.12
Seen so far: 14000 samples
Training accuracy:  0.3700000047683716
----------
Validation accuracy:  0.3700000047683716
----------------------------------------------

有没有关于如何使其融合的建议