Python Tensorflow:RaggedSensor.from_tensor将所有数组中的值展平为一个数组，而不是保留原始数组数_Python_Tensorflow_Tensorflow Datasets

Python Tensorflow:RaggedSensor.from_tensor将所有数组中的值展平为一个数组，而不是保留原始数组数

python tensorflow

Python Tensorflow:RaggedSensor.from_tensor将所有数组中的值展平为一个数组，而不是保留原始数组数,python,tensorflow,tensorflow-datasets,Python,Tensorflow,Tensorflow Datasets,在官方文档中，RaggedTensor.from_tensor的工作原理如下 x = [[1, 3, -1, -1], [2, -1, -1, -1], [4, 5, 8, 9]] print(tf.RaggedTensor.from_tensor(x, padding=-1)) 输出： <tf.RaggedTensor [[1, 3], [2], [4, 5, 8, 9]]> 这是输出 [array([[ 0, 1, 2, 3, -1], [ 2, 3,

在官方文档中，

RaggedTensor.from_tensor

的工作原理如下

x = [[1, 3, -1, -1], [2, -1, -1, -1], [4, 5, 8, 9]]
print(tf.RaggedTensor.from_tensor(x, padding=-1))

输出：

 <tf.RaggedTensor [[1, 3], [2], [4, 5, 8, 9]]>

这是输出

[array([[ 0,  1,  2,  3, -1],
       [ 2,  3,  4, -1, -1],
       [ 3,  6,  5,  4,  3]]), tf.RaggedTensorValue(values=array([0, 1, 2, 3, 2, 3, 4, 3, 6, 5, 4, 3]), row_splits=array([ 0,  4,  7, 12]))]
[array([[ 3,  9, -1, -1],
       [ 0,  1,  2,  3],
       [ 2,  3,  4, -1]]), tf.RaggedTensorValue(values=array([3, 9, 0, 1, 2, 3, 2, 3, 4]), row_splits=array([0, 2, 6, 9]))]
[array([[ 3,  6,  5,  4,  3],
       [ 3,  9, -1, -1, -1],
       [ 0,  1,  2,  3, -1]]), tf.RaggedTensorValue(values=array([3, 6, 5, 4, 3, 3, 9, 0, 1, 2, 3]), row_splits=array([ 0,  5,  7, 11]))]

下面是完整的代码，以最小的例子来重现结果

!pip install -q tf-nightly
import math
import numpy as np
import tensorflow as tf

#Generate Test data
cells = np.array([[0,1,2,3], [2,3,4], [3,6,5,4,3], [3,9]])
mells = np.array([[0], [2], [3], [9]])
print(cells)

#Write test data to tf.records file
writer = tf.python_io.TFRecordWriter('test.tfrecords')
for index in range(mells.shape[0]):
    example = tf.train.Example(features=tf.train.Features(feature={
        'num_value':tf.train.Feature(int64_list=tf.train.Int64List(value=mells[index])),
        'list_value':tf.train.Feature(int64_list=tf.train.Int64List(value=cells[index]))
    }))
    writer.write(example.SerializeToString())
writer.close()

#Open tfrecords file and generate batch from data 
filenames = ["test.tfrecords"]
dataset = tf.data.TFRecordDataset(filenames)
def _parse_function(example_proto):
    keys_to_features = {'num_value':tf.VarLenFeature(tf.int64),
                        'list_value':tf.VarLenFeature(tf.int64)}
    parsed_features = tf.parse_single_example(example_proto, keys_to_features)
    return tf.sparse.to_dense(parsed_features['num_value']), \
           tf.sparse.to_dense(parsed_features['list_value'])
# Parse the record into tensors.
dataset = dataset.map(_parse_function)
# Shuffle the dataset
dataset = dataset.shuffle(buffer_size=1)
# Repeat the input indefinitly
dataset = dataset.repeat()  
# Generate batches
dataset = dataset.padded_batch(3, padded_shapes=([None],[None]), padding_values=(tf.constant(-1, dtype=tf.int64)
                                                 ,tf.constant(-1, dtype=tf.int64)))
iterator = dataset.make_one_shot_iterator()
i, data = iterator.get_next()

#Remove padding
data2= tf.RaggedTensor.from_tensor(data, padding=-1)

#Print data
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print(sess.run([ data, data2 ]))
    print(sess.run([ data, data2 ]))
    print(sess.run([ data, data2 ]))

这是关于不规则张量的官方Tensorflow指南

和官方Tensorflow文档

结果表明，它并没有将其展平，也不确切地知道它是如何工作的，但它看起来像是跟踪换行符的位置，然后在计算时执行它们

tf.raggedtenservalue（值=数组（[3,6,5,4,3,3,9,0,1,2,3]），行分割=数组（[0,5,7,11]）。

“row_splits”跟踪拆分行的位置

下面是一些渴望执行的结果

i, data = iterator.get_next()

#Remove padding
data2= tf.RaggedTensor.from_tensor(data, padding=-1)
print(data2)

i, data = iterator.get_next()

#Remove padding
data2= tf.RaggedTensor.from_tensor(data, padding=-1)
print(data2)

i, data = iterator.get_next()

#Remove padding
data2= tf.RaggedTensor.from_tensor(data, padding=-1)
print(data2)

i, data = iterator.get_next()

#Remove padding
data2= tf.RaggedTensor.from_tensor(data, padding=-1)
print(data2)

结果

<tf.RaggedTensor [[3, 9], [0, 1, 2, 3], [2, 3, 4]]>
<tf.RaggedTensor [[3, 6, 5, 4, 3], [3, 9], [0, 1, 2, 3]]>
<tf.RaggedTensor [[2, 3, 4], [3, 6, 5, 4, 3], [3, 9]]>
<tf.RaggedTensor [[0, 1, 2, 3], [2, 3, 4], [3, 6, 5, 4, 3]]>

正如您所发现的，

RaggedTensor

s实际上并没有变平。在内部，2D

RaggedSensor

使用两个张量/数组进行编码：一个包含值的平面列表，另一个包含行拆分。有关如何使用基本张量/数组对

RaggedTensor

s进行编码的更多详细信息，请参阅：

这种混乱可能来自于打印时显示粗糙传感器的方式。Python有两种字符串转换方法：

\uu str\uu

和

\uu repr\uu

<代码>\uuuu str\uuuuuu用于仅打印一个值，而

\uuuuu repr\uuuuu

用于嵌入某个较大结构（如列表）中的值

对于raggedtenservalue，

方法返回“”%self.to\u list（）
。即，它将向您显示格式为列表的值。但是方法返回“tf.raggedtenservalue（values=%r，row\u splits=%r）”%（self.\u values，self.\u row\u splits）
。也就是说，它将显示用于对raggedtenservalue进行编码的底层numpy数组
<tf.RaggedTensor [[3, 9], [0, 1, 2, 3], [2, 3, 4]]>
<tf.RaggedTensor [[3, 6, 5, 4, 3], [3, 9], [0, 1, 2, 3]]>
<tf.RaggedTensor [[2, 3, 4], [3, 6, 5, 4, 3], [3, 9]]>
<tf.RaggedTensor [[0, 1, 2, 3], [2, 3, 4], [3, 6, 5, 4, 3]]>