Tensorflow 在tf.data输入管道中使用tf.function构建自定义映射函数_Tensorflow_Tensorflow2.0_Tensorflow Datasets

Tensorflow 在tf.data输入管道中使用tf.function构建自定义映射函数

tensorflow

Tensorflow 在tf.data输入管道中使用tf.function构建自定义映射函数,tensorflow,tensorflow2.0,tensorflow-datasets,Tensorflow,Tensorflow2.0,Tensorflow Datasets,我正试图用Python为tensorflowtf.data输入管道编写一个tf.function带注释的映射函数该函数应将字符串转换为一个热编码张量。输入字符串的格式为[ab12]+。（字符串中实际上有更多的字符和数字，但对于下面的示例来说，这些已经足够了。）下面是一个简单的例子： DIM = 100 DIM_A = 1 DIM_B = 2 pos = tf.Variable(0, dtype=tf.int32) @tf.function def my_func(string):

我正试图用Python为tensorflow

tf.data

输入管道编写一个

tf.function

带注释的映射函数

该函数应将字符串转换为一个热编码张量。输入字符串的格式为

[ab12]+

。（字符串中实际上有更多的字符和数字，但对于下面的示例来说，这些已经足够了。）

下面是一个简单的例子：

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = np.zeros(DIM * 10, dtype=np.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output[DIM_A + DIM * pos] = 1
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output[DIM_B + DIM * pos] = 1
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s))

试图计算在输出张量中设置1的位置的索引，我得到以下错误：

NotImplementedError: in user code:

<ipython-input-14-baa9b1605ae2>:18 my_func  *
    output[DIM_A + DIM * pos] = 1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749 __array__
    " array.".format(self.name))

NotImplementedError: Cannot convert a symbolic Tensor (add:0) to a numpy array.

NotImplementedError:在用户代码中：
：18 my_func*
输出[尺寸A+尺寸*位置]=1
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py:749\uu数组__
“数组。”.format（self.name））
NotImplementedError:无法将符号张量（add:0）转换为numpy数组。

代码在急切模式下工作，但在构建图形时中断

我有一个工作版本，它使用一个动态大小的

TensorArray

首先构建输出张量的稀疏版本，然后将其转换为稠密张量，但这非常缓慢。固定大小的TensorArray代替numpy数组也非常慢。我正在努力使它更快。

1）您不能在图形模式下使用

numpy

，因此

输出

应该是

tf.zeros

而不是

np.zeros

2）您不能将

tf.zero赋值给张量，因此您可能应该使用tf.one\u hot
从头开始构造if
最低工作示例：
import tensorflow as tf
import numpy as np 

DIM = 100
DIM_A = 1
DIM_B = 2

pos = tf.Variable(0, dtype=tf.int32)

@tf.function
def my_func(string):
  output = tf.zeros(DIM * 10, dtype=tf.float32)
  pos.assign(0)
  for ch in tf.strings.bytes_split(string):
    if tf.math.equal(ch, tf.constant("1")):
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("2")):
        pos.assign_add(2)
    elif tf.math.equal(ch, tf.constant("a")):
        output = tf.one_hot(DIM_A + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
    elif tf.math.equal(ch, tf.constant("b")):
        output = tf.one_hot(DIM_B + DIM * pos, DIM * 10, dtype=tf.float32)
        pos.assign_add(1)
  return output

s = b"a1b2b"
print(my_func(s).numpy())

此函数用于打印一个热编码向量。我不知道索引是否是您想要的，因此您必须仔细检查偏移量是否正确。
我的输出张量中有多个“1”，但将“output=tf.one_hot（）”更改为“output+=tf.one_hot（）”可以解决此问题。然而，这仍然是非常缓慢的，因为我的张量大小约为1000，所以每次大约需要添加1000个。只要设置“1”的计算索引不依赖于“pos”，我的示例中的numpy数组就可以正常工作。