Python 如何允许文本输入到TensorFlow模型？_Python_Tensorflow_Tensorflow Serving

Python 如何允许文本输入到TensorFlow模型？

python tensorflow

Python 如何允许文本输入到TensorFlow模型？,python,tensorflow,tensorflow-serving,Python,Tensorflow,Tensorflow Serving,我正在TensorFlow中开发一个定制的文本分类模型，现在我想用TensorFlow为生产部署服务。该模型基于通过单独模型计算的文本嵌入进行预测，该模型要求将原始文本编码为向量我现在的工作方式有些脱节，一个服务完成所有文本预处理，然后计算嵌入，然后作为嵌入文本向量发送到文本分类器。如果我们能够将这些都捆绑到一个TensorFlow服务模型中，尤其是初始文本预处理步骤，那就太好了这就是我被困的地方。如何构造作为原始文本输入的张量（或其他TensorFlow原语）？您是否需要做一些特殊的事情来

我正在TensorFlow中开发一个定制的文本分类模型，现在我想用TensorFlow为生产部署服务。该模型基于通过单独模型计算的文本嵌入进行预测，该模型要求将原始文本编码为向量

我现在的工作方式有些脱节，一个服务完成所有文本预处理，然后计算嵌入，然后作为嵌入文本向量发送到文本分类器。如果我们能够将这些都捆绑到一个TensorFlow服务模型中，尤其是初始文本预处理步骤，那就太好了

这就是我被困的地方。如何构造作为原始文本输入的张量（或其他TensorFlow原语）？您是否需要做一些特殊的事情来指定标记向量组件映射的查找表，以便将其保存为模型包的一部分

作为参考，这里是我现在的粗略近似值：

input = tf.placeholder(tf.float32, [None, 510], name='input')

# lots of steps omitted for brevity/clarity

outputs = tf.linalg.matmul(outputs, terminal_layer, transpose_b=True, name='output')

sess = tf.Session()
tf.saved_model.simple_save(sess,
                           'model.pb',
                           inputs={'input': input}, outputs={'output': outputs})

由于TensorFlow标准库中提供了

tf.lookup.StaticVocabularyTable

，这一点变得相对简单

我的模型使用了一种文字包方法，而不是保留顺序，尽管这将是对代码的一个非常简单的更改

假设您有一个对词汇表进行编码的列表对象（我称之为

vocab

）和一个要使用的相应术语/标记嵌入矩阵（我称之为

raw\u term\u embeddings

，因为我将其强制为张量），代码将如下所示：

initalizer = tf.lookup.KeyValueTensorInitializer(vocab, np.arange(len(vocab)))
lut = tf.lookup.StaticVocabularyTable(initalizer, 1) # the one here is the out of vocab size
lut.initializer.run(session=sess) # pushes the LUT onto the session

input = tf.placeholder(tf.string, [None, None], name='input')

ones_at = lut.lookup(input)
encoded_text = tf.math.reduce_sum(tf.one_hot(ones_at, tf.dtypes.cast(lut.size(), np.int32)), axis=0, keepdims=True)

# I didn't build an embedding for the out of vocabulary token
term_embeddings = tf.convert_to_tensor(np.vstack([raw_term_embeddings]), dtype=tf.float32)
embedded_text = tf.linalg.matmul(encoded_text, term_embeddings)

# then use embedded_text for the remainder of the model

还有一个小技巧是确保将

legacy\u init\u op=tf.tables\u initializer（）

传递给save函数，以提示TensorFlow在加载模型时初始化文本编码的查找表。

这相对简单，感谢作为TensorFlow标准库一部分提供的

tf.lookup.StaticVocabularyTable

我的模型使用了一种文字包方法，而不是保留顺序，尽管这将是对代码的一个非常简单的更改

假设您有一个对词汇表进行编码的列表对象（我称之为

vocab

）和一个要使用的相应术语/标记嵌入矩阵（我称之为

raw\u term\u embeddings

，因为我将其强制为张量），代码将如下所示：

initalizer = tf.lookup.KeyValueTensorInitializer(vocab, np.arange(len(vocab)))
lut = tf.lookup.StaticVocabularyTable(initalizer, 1) # the one here is the out of vocab size
lut.initializer.run(session=sess) # pushes the LUT onto the session

input = tf.placeholder(tf.string, [None, None], name='input')

ones_at = lut.lookup(input)
encoded_text = tf.math.reduce_sum(tf.one_hot(ones_at, tf.dtypes.cast(lut.size(), np.int32)), axis=0, keepdims=True)

# I didn't build an embedding for the out of vocabulary token
term_embeddings = tf.convert_to_tensor(np.vstack([raw_term_embeddings]), dtype=tf.float32)
embedded_text = tf.linalg.matmul(encoded_text, term_embeddings)

# then use embedded_text for the remainder of the model

一个小技巧是确保将

legacy\u init\u op=tf.tables\u initializer（）

传递给save函数，以提示TensorFlow在加载模型时初始化文本编码的查找表。

我想保存Static词汇表对象，最好保存在tensor flow模型对象中。在rest服务中使用该模型，您可以指向任何资源来将词汇表类型转换为TF模型吗？我想保存Static词汇表对象，最好保存在tensor flow模型对象中。在rest服务中使用该模型，您能指出将词汇表类型转换为TF模型的任何资源吗？