Python 使用tf估计器导出模型并导出保存的模型函数_Python_Machine Learning_Tensorflow_Neural Network_Google Cloud Ml Engine

Python 使用tf估计器导出模型并导出保存的模型函数

python machine-learning tensorflow neural-network

Python 使用tf估计器导出模型并导出保存的模型函数,python,machine-learning,tensorflow,neural-network,google-cloud-ml-engine,Python,Machine Learning,Tensorflow,Neural Network,Google Cloud Ml Engine,我正在做一个基于Tensorflow的深度神经网络回归。当我试图用tf.estimator保存模型时，我得到以下错误： raise ValueError('Feature {} is not in features dictionary.'.format(key)) ValueError: Feature ad_provider is not in features dictionary. 我需要将其导出，以便在谷歌云平台中部署支持预测的模型。下面是我定义列的位置： CSV_COLUMN

我正在做一个基于Tensorflow的深度神经网络回归。当我试图用tf.estimator保存模型时，我得到以下错误：

 raise ValueError('Feature {} is not in features dictionary.'.format(key))
 ValueError: Feature ad_provider is not in features dictionary.

我需要将其导出，以便在谷歌云平台中部署支持预测的模型。

下面是我定义列的位置：

CSV_COLUMNS = [
"ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec", "video_success"
]

FEATURES = ["ad_provider", "device", "split_group","gold", "secret_areas",
 "scored_enemies", "tutorial_sec"]

LABEL = "video_success"

ad_provider = tf.feature_column.categorical_column_with_vocabulary_list(
    "ad_provider", ["Organic","Apple Search Ads","googleadwords_int",
"Facebook Ads","website"]  )

split_group = tf.feature_column.categorical_column_with_vocabulary_list(
    "split_group", [1,2,3,4])

device = tf.feature_column.categorical_column_with_hash_bucket(
    "device", hash_bucket_size=100)


secret_areas = tf.feature_column.numeric_column("secret_areas")
gold = tf.feature_column.numeric_column("gold")
scored_enemies = tf.feature_column.numeric_column("scored_enemies")
finish_tutorial_sec = tf.feature_column.numeric_column("tutorial_sec")
video_success = tf.feature_column.numeric_column("video_success")


feature_columns = [
tf.feature_column.indicator_column(ad_provider),
tf.feature_column.embedding_column(device, dimension=8),
tf.feature_column.indicator_column(split_group),
tf.feature_column.numeric_column(key="gold"),
tf.feature_column.numeric_column(key="secret_areas"),
tf.feature_column.numeric_column(key="scored_enemies"),
tf.feature_column.numeric_column(key="tutorial_sec"),
]

之后，我创建一个函数以在JSON字典中导出我的模型。我不确定我的发球功能是否做得很好

def json_serving_input_fn(): """Build the serving inputs.""" inputs = {} for feat in feature_columns: inputs[feat.name] = tf.placeholder(shape=[None], dtype= feat.dtype if hasattr(feat, 'dtype') else tf.string) features = { key: tf.expand_dims(tensor, -1) for key, tensor in inputs.items() } return tf.contrib.learn.InputFnOps(features, None, inputs)
以下是我的代码的其余部分：

def main(unused_argv): #Normalize columns 'Gold' and 'tutorial_sec' for Traininig Set train_n = training_set train_n['gold'] = (train_n['gold'] - train_n['gold'].mean()) / (train_n['gold'].max() - train_n['gold'].min()) train_n['tutorial_sec'] = (train_n['tutorial_sec'] - train_n['tutorial_sec'].mean()) / (train_n['tutorial_sec'].max() - train_n['tutorial_sec'].min()) train_n['scored_enemies'] = (train_n['scored_enemies'] - train_n['scored_enemies'].mean()) / (train_n['scored_enemies'].max() - train_n['scored_enemies'].min()) test_n = test_set test_n['gold'] = (test_n['gold'] - test_n['gold'].mean()) / (test_n['gold'].max() - test_n['gold'].min()) test_n['tutorial_sec'] = (test_n['tutorial_sec'] - test_n['tutorial_sec'].mean()) / (test_n['tutorial_sec'].max() - test_n['tutorial_sec'].min()) test_n['scored_enemies'] = (test_n['scored_enemies'] - test_n['scored_enemies'].mean()) / (test_n['scored_enemies'].max() - test_n['scored_enemies'].min()) train_input_fn = tf.estimator.inputs.pandas_input_fn( x=train_n, y=pd.Series(train_n[LABEL].values), batch_size=100, num_epochs=None, shuffle=True) test_input_fn = tf.estimator.inputs.pandas_input_fn( x=test_n, y=pd.Series(test_n[LABEL].values), batch_size=100, num_epochs=1, shuffle=False) regressor = tf.estimator.DNNRegressor(feature_columns=feature_columns, hidden_units=[40, 30, 20], model_dir="model1", optimizer='RMSProp' ) # Train regressor.train(input_fn=train_input_fn, steps=5) regressor.export_savedmodel("test",json_serving_input_fn) #Evaluate loss over one epoch of test_set. #For each step, calls `input_fn`, which returns one batch of data. ev = regressor.evaluate( input_fn=test_input_fn) loss_score = ev["loss"] print("Loss: {0:f}".format(loss_score)) for key in sorted(ev): print("%s: %s" % (key, ev[key])) # Print out predictions over a slice of prediction_set. y = regressor.predict( input_fn=test_input_fn) # Array with prediction list! predictions = list(p["predictions"] for p in y) #real = list(p["real"] for p in pd.Series(training_set[LABEL].values)) real = test_set[LABEL].values diff = np.subtract(real,predictions) diff = np.absolute(diff) diff = np.mean(diff) print("Mean Square Error of Test Set = ",diff*diff)

除了您提到的问题外，我预计您还会遇到多个其他问题：

您正在使用TensorFlow 1.3中引入的
tf.estimator.DnnRegressor
。CloudML引擎仅正式支持TF1.2

您正在规范化panda数据帧中的功能，而这不会在服务时发生（除非您在客户端这样做）。这会引入倾斜，您将得到较差的预测结果

因此，让我们首先使用
tf.contrib.learn.DNNRegressor
，它只需要稍作修改：

regressor = tf.estimator.DNNRegressor( feature_columns=feature_columns, hidden_units=[40, 30, 20], model_dir="model1", optimizer='RMSProp' ) regressor.fit(input_fn=train_input_fn, steps=5) regressor.export_savedmodel("test",json_serving_input_fn)
注意
fit
而不是
train
（NB:您的
json\u服务\u输入fn
实际上已经为TF 1.2编写，并且与TF 1.3不兼容。这对现在来说是好的）
现在，您看到的错误的根本原因是列/功能
ad\u provider
不在输入和功能列表中（但您确实有
ad\u provider\u指示器
）。这是因为您正在迭代
feature\u列
，而不是通过原始输入列列表。解决这一问题的方法是迭代实际输入，而不是特征列；但是，我们还需要知道类型（只需几列即可简化）：

最后，要规范化数据，您可能需要在图形中这样做。您可以尝试使用，或者编写一个自定义估计器来进行转换，委托实际的模型实现DNNRegrestor。
除了您提到的问题之外，我预计您还会遇到多个实际的其他问题：

您正在使用TensorFlow 1.3中引入的
tf.estimator.DnnRegressor
。CloudML引擎仅正式支持TF1.2

您正在规范化panda数据帧中的功能，而这不会在服务时发生（除非您在客户端这样做）。这会引入倾斜，您将得到较差的预测结果

因此，让我们首先使用
tf.contrib.learn.DNNRegressor
，它只需要稍作修改：

regressor = tf.estimator.DNNRegressor( feature_columns=feature_columns, hidden_units=[40, 30, 20], model_dir="model1", optimizer='RMSProp' ) regressor.fit(input_fn=train_input_fn, steps=5) regressor.export_savedmodel("test",json_serving_input_fn)
注意
fit
而不是
train
（NB:您的
json\u服务\u输入fn
实际上已经为TF 1.2编写，并且与TF 1.3不兼容。这对现在来说是好的）
现在，您看到的错误的根本原因是列/功能
ad\u provider
不在输入和功能列表中（但您确实有
ad\u provider\u指示器
）。这是因为您正在迭代
feature\u列
，而不是通过原始输入列列表。解决这一问题的方法是迭代实际输入，而不是特征列；但是，我们还需要知道类型（只需几列即可简化）：

最后，要规范化数据，您可能需要在图形中这样做。您可以尝试使用，或者编写一个自定义估计器来进行转换，委托实际的模型实现DNNRegrestor。
我做了另一个答案来详细解释我的问题。我做了另一个答案来详细解释我的问题。