Python Dataset.from_张量和Dataset.from_张量切片之间的区别是什么？_Python_Tensorflow_Tensorflow Datasets

Python Dataset.from_张量和Dataset.from_张量切片之间的区别是什么？

python tensorflow

Python Dataset.from_张量和Dataset.from_张量切片之间的区别是什么？,python,tensorflow,tensorflow-datasets,Python,Tensorflow,Tensorflow Datasets,我有一个数据集，表示为形状的NumPy矩阵（num\u特征，num\u示例），我希望将其转换为TensorFlow类型tf.dataset 我正在努力理解这两种方法之间的区别：Dataset.from\u tensors和Dataset.from\u tensor\u slices。什么是正确的，为什么 TensorFlow文档（）指出，这两种方法都接受一个嵌套的张量结构，尽管当使用来自\u tensor\u切片的时，张量在第0维中的大小应该相同来自_tensors的组合输入并返回带有单个元素

我有一个数据集，表示为形状的NumPy矩阵

（num\u特征，num\u示例）

，我希望将其转换为TensorFlow类型

tf.dataset

我正在努力理解这两种方法之间的区别：

Dataset.from\u tensors

和

Dataset.from\u tensor\u slices

。什么是正确的，为什么

TensorFlow文档（）指出，这两种方法都接受一个嵌套的张量结构，尽管当使用来自\u tensor\u切片的

时，张量在第0维中的大小应该相同
 来自_tensors的组合输入并返回带有单个元素的数据集：
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensors(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[1, 2],
        [3, 4]], dtype=int32)>]

1） 两者之间的主要区别在于，中的嵌套元素必须在第0级具有相同的维度：
# exception: ValueError: Dimensions 10 and 9 are not compatible
dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([10, 4]), tf.random_uniform([9])))
# OK, first dimension is same
dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([10, 4]), tf.random_uniform([10])))

2） 解释的第二个区别是，tf.Dataset的输入是一个列表。例如：
dataset1 = tf.data.Dataset.from_tensor_slices(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

dataset2 = tf.data.Dataset.from_tensors(
    [tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])

print(dataset1) # shapes: (2, 3)
print(dataset2) # shapes: (2, 2, 3)

在上面的例子中，from_tensors
创建一个3D张量，而from_tensor\u切片
合并输入张量。如果您有不同图像通道的不同来源，并且希望将它们连接到一个RGB图像张量中，这将非常方便
3） A在前面的回答中提到，from_tensors
将输入张量转换为一个大张量：
import tensorflow as tf

tf.enable_eager_execution()

dataset1 = tf.data.Dataset.from_tensor_slices(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

dataset2 = tf.data.Dataset.from_tensors(
    (tf.random_uniform([4, 2]), tf.random_uniform([4])))

for i, item in enumerate(dataset1):
    print('element: ' + str(i + 1), item[0], item[1])

print(30*'-')

for i, item in enumerate(dataset2):
    print('element: ' + str(i + 1), item[0], item[1])

输出：
element: 1 tf.Tensor(... shapes: ((2,), ()))
element: 2 tf.Tensor(... shapes: ((2,), ()))
element: 3 tf.Tensor(... shapes: ((2,), ()))
element: 4 tf.Tensor(... shapes: ((2,), ()))
-------------------------
element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))

试试这个：
import tensorflow as tf  # 1.13.1
tf.enable_eager_execution()

t1 = tf.constant([[11, 22], [33, 44], [55, 66]])

print("\n=========     from_tensors     ===========")
ds = tf.data.Dataset.from_tensors(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

print("\n=========   from_tensor_slices    ===========")
ds = tf.data.Dataset.from_tensor_slices(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
    print (e)

输出：
=========      from_tensors    ===========
<dtype: 'int32'> : (3, 2)
tf.Tensor(
[[11 22]
 [33 44]
 [55 66]], shape=(3, 2), dtype=int32)

=========   from_tensor_slices      ===========
<dtype: 'int32'> : (2,)
tf.Tensor([11 22], shape=(2,), dtype=int32)
tf.Tensor([33 44], shape=(2,), dtype=int32)
tf.Tensor([55 66], shape=(2,), dtype=int32)

我认为@MatthewScarpino清楚地解释了这两种方法之间的差异
在这里，我试图描述这两种方法的典型用法：

from_tensors
可用于从多个小数据集构建更大的数据集，即数据集的大小（长度）变大
而来自_tensor的_切片
可用于将不同的元素组合到一个数据集中，例如，将特征和标签组合到一个数据集中（这也是张量的第一维应相同的原因）。也就是说，数据集变得“更宽”
@MathewScarpino:你能详细说明什么时候使用when吗？我认为困惑的根源（至少对它来说）是这个名字。因为from_tensor_slices从原始数据创建切片……理想的名称应该是“to_tensor_slices”—因为您正在从中获取数据并创建张量切片。一旦您按照这些思路思考，TF2中的所有文档对我来说都变得非常清晰！对于我来说，文档中缺少的一个关键信息是，多个张量作为元组传递给这些方法，例如，来自张量（（t1，t2，t3，）
。有了这些知识，from_tensors
生成一个数据集，其中每个输入张量都像数据集的一行，而from_tensor\u slices生成一个数据集，其中每个输入张量都是数据的一列；因此，在后一种情况下，所有张量的长度必须相同，结果数据集的元素（行）都是元组，每列有一个元素。对于tf 2，我得到：AttributeError:“TensorDataset”对象没有属性“output\u types”PS：它应该是tf.random.uniform而不是tf.random\u uniform
=========      from_tensors    ===========
<dtype: 'int32'> : (3, 2)
tf.Tensor(
[[11 22]
 [33 44]
 [55 66]], shape=(3, 2), dtype=int32)

=========   from_tensor_slices      ===========
<dtype: 'int32'> : (2,)
tf.Tensor([11 22], shape=(2,), dtype=int32)
tf.Tensor([33 44], shape=(2,), dtype=int32)
tf.Tensor([55 66], shape=(2,), dtype=int32)

t1 = tf.constant([[[11, 22], [33, 44], [55, 66]],
                  [[110, 220], [330, 440], [550, 660]]])