Tensorflow 使用TFBertForSequenceClassification时的损失nan问题_Tensorflow_Machine Learning_Keras_Deep Learning_Bert Language Model

Tensorflow 使用TFBertForSequenceClassification时的损失nan问题

tensorflow machine-learning keras deep-learning

Tensorflow 使用TFBertForSequenceClassification时的损失nan问题,tensorflow,machine-learning,keras,deep-learning,bert-language-model,Tensorflow,Machine Learning,Keras,Deep Learning,Bert Language Model,我在训练多标签文本分类模型时遇到了一个问题。我在Colab工作如下： def create_sentiment_bert(): config = BertConfig.from_pretrained("monologg/kobert", num_labels=52) model = TFBertForSequenceClassification.from_pretrained("monologg/kobert", config=config, f

我在训练多标签文本分类模型时遇到了一个问题。我在Colab工作如下：

def create_sentiment_bert():
  config = BertConfig.from_pretrained("monologg/kobert", num_labels=52)
  model = TFBertForSequenceClassification.from_pretrained("monologg/kobert", config=config, from_pt=True)
  opt = tf.keras.optimizers.Adam(learning_rate=4.0e-6)
  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
  metric = tf.keras.metrics.SparseCategoricalAccuracy("accuracy")
  model.compile(optimizer=opt, loss=loss, metrics=[metric])
  return model

sentiment_model = create_sentiment_bert()

sentiment_model.fit(train_x, train_y, epochs=2, shuffle=True, batch_size=250, validation_data=(test_x, test_y))

[Label] [Count]
501     694624
601     651306
401     257665
210     250352
307     170665
301     153318
306     147948
201     141382
302     113917
402     102040
606     101434
506     73492
305     69876
604     62056
403     57956
104     56800
107     55503
607     40293
503     36272
505     34757
303     26884
308     24539
304     22135
205     20744
509     19465
206     16665
508     15334
208     13335
603     13240
504     12299
602     10684
202     10366
209     8267
106     6564
502     5880
211     5804
207     2794
507     1967
108     1860
204     1633
105     1545
109     682
605     426
102     276
101     274
405     268
212     204
213     153
103     103
203     90
404     65
608     37

结果如下：

def create_sentiment_bert():
  config = BertConfig.from_pretrained("monologg/kobert", num_labels=52)
  model = TFBertForSequenceClassification.from_pretrained("monologg/kobert", config=config, from_pt=True)
  opt = tf.keras.optimizers.Adam(learning_rate=4.0e-6)
  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
  metric = tf.keras.metrics.SparseCategoricalAccuracy("accuracy")
  model.compile(optimizer=opt, loss=loss, metrics=[metric])
  return model

sentiment_model = create_sentiment_bert()

sentiment_model.fit(train_x, train_y, epochs=2, shuffle=True, batch_size=250, validation_data=(test_x, test_y))

[Label] [Count]
501     694624
601     651306
401     257665
210     250352
307     170665
301     153318
306     147948
201     141382
302     113917
402     102040
606     101434
506     73492
305     69876
604     62056
403     57956
104     56800
107     55503
607     40293
503     36272
505     34757
303     26884
308     24539
304     22135
205     20744
509     19465
206     16665
508     15334
208     13335
603     13240
504     12299
602     10684
202     10366
209     8267
106     6564
502     5880
211     5804
207     2794
507     1967
108     1860
204     1633
105     1545
109     682
605     426
102     276
101     274
405     268
212     204
213     153
103     103
203     90
404     65
608     37

纪元1/2 739/14065[>....]预计到达时间：35:31-损失：nan-准确度：0.0000e+00

我已签出我的数据：没有nan或null或无效值

我尝试了不同的优化程序，不同的时代，不同的学习速度，但都有相同的问题

标签数量为52个，分布如下：

def create_sentiment_bert():
  config = BertConfig.from_pretrained("monologg/kobert", num_labels=52)
  model = TFBertForSequenceClassification.from_pretrained("monologg/kobert", config=config, from_pt=True)
  opt = tf.keras.optimizers.Adam(learning_rate=4.0e-6)
  loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
  metric = tf.keras.metrics.SparseCategoricalAccuracy("accuracy")
  model.compile(optimizer=opt, loss=loss, metrics=[metric])
  return model

sentiment_model = create_sentiment_bert()

sentiment_model.fit(train_x, train_y, epochs=2, shuffle=True, batch_size=250, validation_data=(test_x, test_y))

[Label] [Count]
501     694624
601     651306
401     257665
210     250352
307     170665
301     153318
306     147948
201     141382
302     113917
402     102040
606     101434
506     73492
305     69876
604     62056
403     57956
104     56800
107     55503
607     40293
503     36272
505     34757
303     26884
308     24539
304     22135
205     20744
509     19465
206     16665
508     15334
208     13335
603     13240
504     12299
602     10684
202     10366
209     8267
106     6564
502     5880
211     5804
207     2794
507     1967
108     1860
204     1633
105     1545
109     682
605     426
102     276
101     274
405     268
212     204
213     153
103     103
203     90
404     65
608     37

我是这方面的初学者。请帮帮我。提前谢谢

为什么您有from_logits=False？分类器头返回logits，因此除非您在模型中加入softmax激活，否则您需要计算logits的损失。

为什么您有from\u logits=False？分类器头返回logits，因此除非您在模型中加入softmax激活，否则您需要计算logits的损失。

这与optimizer或类似的内容无关。您必须再次检查对模型的输入。您使用什么作为输入？我的数据有三个字段：（1）文本内容（字符串，utf-8）（2）5个不同值的标签1（0~4）（3）52个不同值的标签3（100~608）。用（1）对（2）进行分类的训练模型是可以的，但问题出现在用（1）对（3）进行分类的训练模型中。这与优化器或类似的东西无关。您必须再次检查对模型的输入。您使用什么作为输入？我的数据有三个字段：（1）文本内容（字符串，utf-8）（2）5个不同值的标签1（0~4）（3）52个不同值的标签3（100~608）。用（1）分类（2）的训练模型可以，但用（1）分类（3）的训练模型会出现问题。