Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/314.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 当数据集(tfrecord)具有2个以上的元素规格(特性)时,如何训练模型?_Python_Tensorflow_Dataset_Tfrecord_Tensorflow2 - Fatal编程技术网

Python 当数据集(tfrecord)具有2个以上的元素规格(特性)时,如何训练模型?

Python 当数据集(tfrecord)具有2个以上的元素规格(特性)时,如何训练模型?,python,tensorflow,dataset,tfrecord,tensorflow2,Python,Tensorflow,Dataset,Tfrecord,Tensorflow2,最近,我学习了如何基于.tfrecord文件训练模型,并取得了一些不错的成绩。但是,在模型训练过程中,我在处理超过2个元素规格的数据集(或超过2个特性的tfrecord)时遇到了一些问题。我创建了一个简单的代码,如下所示: import tensorflow as tf import numpy as np import os from tensorflow.keras import models, losses, optimizers buffer_size = 100 batch_size

最近,我学习了如何基于.tfrecord文件训练模型,并取得了一些不错的成绩。但是,在模型训练过程中,我在处理超过2个元素规格的数据集(或超过2个特性的tfrecord)时遇到了一些问题。我创建了一个简单的代码,如下所示:

import tensorflow as tf
import numpy as np
import os
from tensorflow.keras import models, losses, optimizers

buffer_size = 100
batch_size = 32


def _bytes_feature(value):
    """Returns a bytes_list from a string / byte."""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy()  # BytesList won't unpack a string from an EagerTensor.
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))


def _float_feature(value):
    """Returns a float_list from a float / double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))


def _int64_feature(value):
    """Returns an int64_list from a bool / enum / int / uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))


def serialize_array(array):
    array = tf.io.serialize_tensor(array)
    return array


def trans_func(exam):
    feature_description = {
        'former_seq': tf.io.FixedLenFeature([], tf.string),
        'processed_latter_seq': tf.io.FixedLenFeature([], tf.string),
        'actual_latter_seq': tf.io.FixedLenFeature([], tf.string),
    }
    features = tf.io.parse_single_example(exam, feature_description)
    former_seq = tf.io.parse_tensor(features['former_seq'], tf.float32)
    processed_latter_seq = tf.io.parse_tensor(features['processed_latter_seq'], tf.float32)
    actual_latter_seq = tf.io.parse_tensor(features['actual_latter_seq'], tf.float32)
    return former_seq, processed_latter_seq, actual_latter_seq


# write and read .tfrecord
X = np.random.normal(size=(32, 28, 28, 5))
Y = np.random.normal(size=(32, 28, 28, 3))
Ymin, Ymax = tf.reduce_min(Y), tf.reduce_max(Y)
ruduced_Y = (Y - Ymin) / Ymax

writer = tf.io.TFRecordWriter('test.tfrecords')
feature = {'former_seq': _bytes_feature(serialize_array(X)),
           'processed_latter_seq': _bytes_feature(serialize_array(ruduced_Y)),
           'actual_latter_seq': _bytes_feature(serialize_array(Y)),
           }
example = tf.train.Example(features=tf.train.Features(feature=feature))
writer.write(example.SerializeToString())
writer.close()

dataset = tf.data.TFRecordDataset('test.tfrecords')
dataset = dataset.map(trans_func)
dataset = dataset.shuffle(buffer_size)

# create a simple model
inputs = tf.keras.Input((32, 32, 5))
outputs = tf.keras.layers.Conv2D(filters=3, kernel_size=3, padding='same')(inputs)
simple_model = tf.keras.models.Model(inputs, outputs)
simple_model.compile(optimizer=optimizers.Adam(), loss=losses.MAE, metrics=['mse'])
simple_model.summary()
很抱歉,这些代码看起来有点长。正如我们所知,在数据预处理中,我在这里使用Y上的最小-最大归一化得到减少的Y,并使用模型计算预测减少的Y(未在代码中显示),在预测减少的Y上恢复数据后,最后我可以得到预测Y

因此,问题在于函数trans\u func返回三种数据,这使得dataset有三个元素规格(X,reduced\u Y,Y)。但在model.fit中,它只支持两个元素规格。我尝试了几次,但失败了:

simple_model.fit(dataset, epochs=5)
simple_model.fit(dataset.element_spec[0], dataset.element_spec[1], epochs=5)
对于这个问题,我也有一些不明智的解决方案。例如,在数据集上使用For循环 十、 减少Y,Y,然后

simple_model.fit(X,reduced_Y, epochs=5)
另一个例子是,让test.tfrecord只具有X,Y,让trans_func具有最小-最大规格化,并返回X,减少了Y。 但是,我对它们不满意,因为它们避免了dataset包含超过2个elment_spec的情况。我只是想找到一个很好的解决方案

事先非常感谢。