Tensorflow 如何有效地使用OrderedICT制作的tf.data.Dataset?

Tensorflow 如何有效地使用OrderedICT制作的tf.data.Dataset?,tensorflow,keras,tensorflow2.0,tensorflow-datasets,Tensorflow,Keras,Tensorflow2.0,Tensorflow Datasets,使用TensorFlow 2.3.1,下面的代码片段将失败 import tensorflow as tf url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip" tf.keras.utils.get_file( origin=url, fname='creditcard.zip', cache_dir="/tmp/dataset

使用TensorFlow 2.3.1,下面的代码片段将失败

import tensorflow as tf

url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip"

tf.keras.utils.get_file(
    origin=url,
    fname='creditcard.zip',
    cache_dir="/tmp/datasets/",
    extract=True)

ds = tf.data.experimental.make_csv_dataset(
    "/tmp/datasets/*.csv",
    batch_size=2048,
    label_name="Class",
    select_columns=["V1","V2","Class"],
    num_rows_for_inference=None,
    shuffle_buffer_size=600,
    ignore_errors=True)

model = tf.keras.Sequential(
    [
        tf.keras.layers.Dense(256, activation="relu"),
        tf.keras.layers.Dense(1, activation="sigmoid", name="labeling"),
    ],
)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss="binary_crossentropy", 
)

model.fit(
    ds,
    steps_per_epoch=5,
    epochs=3,
)
错误堆栈为

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-c79f80f9d0fd> in <module>
----> 1 model.fit(
      2     ds,
      3     steps_per_epoch=5,
      4     epochs=3,
      5 )

[...]

    ValueError: Layer sequential expects 1 inputs, but it received 2 input tensors. Inputs received: [<tf.Tensor 'ExpandDims:0' shape=(2048, 1) dtype=float32>, <tf.Tensor 'ExpandDims_1:0' shape=(2048, 1) dtype=float32>]
我向各位大师提出的问题:

  • 我做的是对的还是有更好的解决方案
  • 就性能而言,对于不适合内存的数据集,该解决方案可行吗

我不确定您的代码是否适合备忘录中的数据

如果没有,您可以按如下方式更改代码:

import tensorflow as tf

url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip"
ds = tf.data.experimental.make_csv_dataset(
    "/tmp/datasets/*.csv",
    batch_size=2048,
    label_name="Class",
    select_columns=["V1","V2","Class"],
    num_rows_for_inference=None,
    ignore_errors=True,
    num_epochs = 1,
    shuffle_buffer_size=2048*1000, 
    prefetch_buffer_size=tf.data.experimental.AUTOTUNE
)

input_list = []
for column in ["V1", "V2"]:
    _input = tf.keras.Input(shape=(1,))
    input_list.append(_input)

concat = tf.keras.layers.Concatenate(name="concat")(input_list)
dense = tf.keras.layers.Dense(256, activation="relu", name="dense", dtype='float64' )(concat)
output_dense = tf.keras.layers.Dense(1, activation="sigmoid", name="labeling", dtype='float64')(dense)
model = tf.keras.Model(inputs=input_list, outputs=output_dense)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss="binary_crossentropy", 
)

model.fit(
    ds,
    steps_per_epoch=5,
    epochs=10,
)

我编辑了您的答案,并修复了代码,使其能够运行。非常感谢。当我把整个数据集(约250K行/30列)放在一个相当小的数据集上时,我得到的平均持续时间是110ms/步。不确定这种使用一台机器的测试是否有意义。基于我所掌握的有限知识,我倾向于您的解决方案,因为它更易于理解和维护。对我来说,tf.concat的
。欢迎任何补充意见/评论。
import tensorflow as tf

url = "https://storage.googleapis.com/download.tensorflow.org/data/creditcard.zip"
ds = tf.data.experimental.make_csv_dataset(
    "/tmp/datasets/*.csv",
    batch_size=2048,
    label_name="Class",
    select_columns=["V1","V2","Class"],
    num_rows_for_inference=None,
    ignore_errors=True,
    num_epochs = 1,
    shuffle_buffer_size=2048*1000, 
    prefetch_buffer_size=tf.data.experimental.AUTOTUNE
)

input_list = []
for column in ["V1", "V2"]:
    _input = tf.keras.Input(shape=(1,))
    input_list.append(_input)

concat = tf.keras.layers.Concatenate(name="concat")(input_list)
dense = tf.keras.layers.Dense(256, activation="relu", name="dense", dtype='float64' )(concat)
output_dense = tf.keras.layers.Dense(1, activation="sigmoid", name="labeling", dtype='float64')(dense)
model = tf.keras.Model(inputs=input_list, outputs=output_dense)

model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-2),
    loss="binary_crossentropy", 
)

model.fit(
    ds,
    steps_per_epoch=5,
    epochs=10,
)