Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/18.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在tensorflow中保存文本分类模型?_Python_Python 3.x_Tensorflow_Machine Learning - Fatal编程技术网

Python 如何在tensorflow中保存文本分类模型?

Python 如何在tensorflow中保存文本分类模型?,python,python-3.x,tensorflow,machine-learning,Python,Python 3.x,Tensorflow,Machine Learning,在阅读文本分类时,我在下面列出了一个脚本,用于训练文本分类模型(正/负)。有一件事我不确定。如何保存模型以便以后重用?另外,如何测试我拥有的输入测试集 import tensorflow as tf import tensorflow_hub as hub import matplotlib.pyplot as plt import numpy as np import os import pandas as pd import re import seaborn as sns # Loa

在阅读文本分类时,我在下面列出了一个脚本,用于训练文本分类模型(正/负)。有一件事我不确定。如何保存模型以便以后重用?另外,如何测试我拥有的输入测试集

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns



# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
  data = {}
  data["sentence"] = []
  data["sentiment"] = []
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      data["sentence"].append(f.read())
      data["sentiment"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
  pos_df = load_directory_data(os.path.join(directory, "pos"))
  neg_df = load_directory_data(os.path.join(directory, "neg"))
  pos_df["polarity"] = 1
  neg_df["polarity"] = 0
  return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_download=False):
  dataset = tf.keras.utils.get_file(
      fname="aclImdb.tar.gz", 
      origin="http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz", 
      extract=True)

  train_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                       "aclImdb", "train"))
  test_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                      "aclImdb", "test"))

  return train_df, test_df

# Reduce logging output.
tf.logging.set_verbosity(tf.logging.ERROR)

train_df, test_df = download_and_load_datasets()
train_df.head()


# Training input on the whole training set with no limit on training epochs.
train_input_fn = tf.estimator.inputs.pandas_input_fn(
    train_df, train_df["polarity"], num_epochs=None, shuffle=True)

# Prediction on the whole training set.
predict_train_input_fn = tf.estimator.inputs.pandas_input_fn(
    train_df, train_df["polarity"], shuffle=False)
# Prediction on the test set.
predict_test_input_fn = tf.estimator.inputs.pandas_input_fn(
    test_df, test_df["polarity"], shuffle=False)


embedded_text_feature_column = hub.text_embedding_column(
    key="sentence", 
    module_spec="https://tfhub.dev/google/nnlm-en-dim128/1")


estimator = tf.estimator.DNNClassifier(
    hidden_units=[500, 100],
    feature_columns=[embedded_text_feature_column],
    n_classes=2,
    optimizer=tf.train.AdagradOptimizer(learning_rate=0.003))

# Training for 1,000 steps means 128,000 training examples with the default
# batch size. This is roughly equivalent to 5 epochs since the training dataset
# contains 25,000 examples.
estimator.train(input_fn=train_input_fn, steps=1000);

train_eval_result = estimator.evaluate(input_fn=predict_train_input_fn)
test_eval_result = estimator.evaluate(input_fn=predict_test_input_fn)

print "Training set accuracy: {accuracy}".format(**train_eval_result)
print "Test set accuracy: {accuracy}".format(**test_eval_result)
目前,如果我运行上面的脚本,它将重新生成完整的模型。我想重用该模型,并将其输出到我拥有的一些示例文本中。我怎么能这样做

我已尝试保存以下内容:


但这是一个错误,说
值错误:没有要保存的变量

您只需将
model_dir
参数传递给估计器实例和传递给预制估计器的
config
参数的
tf.Estimator.RunConfig
实例,就可以对保存/加载的估计器模型进行训练和预测(由于Tensorflow 1.4仍然适用于Tensorflow 1.12):

然后,您将能够调用
classifier.train()
classifier.predict()
,重新运行脚本,跳过
classifier.train()
调用,并在再次调用
classifier.predict()
后收到相同的结果


使用
hub.text\u embedded\u列
feature列,使用
category\u列
embedded\u列
功能列时,使用手动保存/还原的
词汇处理器
字典。

我已经尝试并完成了
saver.save(估计器,'my model',global\u step=0)
但是它给出了一个错误,说
没有要保存的变量
@rhaertel80在上看到了您的答案。当我尝试使用您指定的方法时,我收到一个错误,说
功能词典中没有功能句子
sess = tf.Session()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.save(sess, 'test-model')
        model_path = '/path/to/model'
        run_config = tf.estimator.RunConfig(model_dir=model_path,
                                            tf_random_seed=72,  #Default=None
                                            save_summary_steps=100,
                                            # save_checkpoints_steps=_USE_DEFAULT,  #Default=1000
                                            # save_checkpoints_secs=_USE_DEFAULT,  #Default=60
                                            session_config=None,
                                            keep_checkpoint_max=12,  #Default=5
                                            keep_checkpoint_every_n_hours=10000,
                                            log_step_count_steps=100,
                                            train_distribute=None,
                                            device_fn=None,
                                            protocol=None,
                                            eval_distribute=None,
                                            experimental_distribute=None)
        classifier = tf.estimator.DNNLinearCombinedClassifier(
            config=run_config,
            model_dir=model_path,
            ...
        )