在tensorflow中将一个单词剥离为其组成字符_Tensorflow_Tensorflow Hub

在tensorflow中将一个单词剥离为其组成字符

tensorflow

在tensorflow中将一个单词剥离为其组成字符,tensorflow,tensorflow-hub,Tensorflow,Tensorflow Hub,我有一个形状为[None，None]的张量占位符，类型为string。例如，它看起来像这样 [[“你好”，“世界”]，[“美国人”，“人”] 现在我想把这个二维张量转换成一个三维张量，它现在基本上会把每个单词都分割成它的组成字符。所以输出看起来像 [[“H”、“e”、“l”、“l”、“o”]、[“W”、“o”、“r”、“l”、“d”]、[“A”、“m”、“e”、“r”、“i”、“c”、“A”、“n”]、[“p”、“e”、“o”、“p”、“l”、“e”]] 因为每个单词都有不同数量的字符，所以新的

我有一个形状为

[None，None]

的张量占位符，类型为

string

。例如，它看起来像这样

[[“你好”，“世界”]，[“美国人”，“人”]

现在我想把这个二维张量转换成一个三维张量，它现在基本上会把每个单词都分割成它的组成字符。所以输出看起来像

[[“H”、“e”、“l”、“l”、“o”]、[“W”、“o”、“r”、“l”、“d”]、[“A”、“m”、“e”、“r”、“i”、“c”、“A”、“n”]、[“p”、“e”、“o”、“p”、“l”、“e”]]

因为每个单词都有不同数量的字符，所以新的张量应该用空格填充小词。在tensorflow中有什么方法可以实现这一点吗？

这会运行

import tensorflow as tf
import tensorflow_transform as tft

input_data = tf.placeholder(shape=[None, None], dtype=tf.string, name="words")
words_flatten = tf.reshape(words, [tf.shape(words)[0] * tf.shape(words)[1]])
words_split = tf.string_split(words_flatten, delimiter="")
ngrams = tft.ngrams(words_split, ngram_range=(1,3), separator="")
tokens= tf.sparse_reset_shape(tf.sparse_fill_empty_rows(ngrams, "")[0])
tokens_dense = tf.reshape(
            tf.sparse_to_dense(tokens.indices, tokens.dense_shape, tokens.values, default_value=""),
            [tf.shape(words)[0], tf.shape(words)[1], -1]
        )

tokens\u density

是所需的输出