Sql 以datasource为数据库构建tensorflow数据集
我必须用tensorflow tf.data创建一个数据输入管道。数据源是一个mongodb和sql server。如何从数据库创建tf.data对象。我看到的所有文章都有.tfrecords或.csv作为tensorflow的数据源 多谢各位。Sql 以datasource为数据库构建tensorflow数据集,sql,mongodb,tensorflow,Sql,Mongodb,Tensorflow,我必须用tensorflow tf.data创建一个数据输入管道。数据源是一个mongodb和sql server。如何从数据库创建tf.data对象。我看到的所有文章都有.tfrecords或.csv作为tensorflow的数据源 多谢各位。 感谢您的输入从数据库中检索数据并将其存储为numpy数组。如果数组太大,无法存储内存,请尝试使用memmap数组 然后创建一个生成器,下面是一个来自我自己代码的图像及其onehot编码示例: def tf_augmented_image_generat
感谢您的输入从数据库中检索数据并将其存储为numpy数组。如果数组太大,无法存储内存,请尝试使用memmap数组 然后创建一个生成器,下面是一个来自我自己代码的图像及其onehot编码示例:
def tf_augmented_image_generator(images,
onehots,
batch_size,
map_fn,
shuffle_size=1000,
num_parallel_calls=tf.data.experimental.AUTOTUNE):
"""
Create a generator suing a tf.data.Dataframe with augmentation via a map function.
The generator can then be used for training in model.fit_generator
The map function must consist of tensorflow operators (not numpy).
On Windows machines this will lead to faster augmentation, as there are some
problems performing augmentation in parallel when multiprocessing is enabled in
in model.fit / model.fit_generator and the default Keras numpy-based augmentated is used,
e.g. in ImageDataGenerator
:param images: Images to augment
:param onehots: Onehot encoding of target class
:param batch_size: Batch size for training
:param map_fn: The augmentation map function
:param shuffle_size: Batch size of images shuffled. Smaller values reduce memory consumption.
:param num_parallel_calls: Number of calls in parallel, default is automatic tuning.
:return:
"""
# Get shapes from input data
img_size = images.shape
img_size = (None, img_size[1], img_size[2], img_size[3])
onehot_size = onehots.shape
onehot_size = (None, onehot_size[1])
images_tensor = tf.placeholder(tf.float32, shape=img_size)
onehots_tensor = tf.placeholder(tf.float32, shape=onehot_size)
# Create dataset
dataset = tf.data.Dataset.from_tensor_slices((images_tensor, onehots_tensor))
if map_fn is not None:
dataset = dataset.map(lambda x, y: (map_fn(x), y), num_parallel_calls=num_parallel_calls)
dataset = dataset.shuffle(shuffle_size, reshuffle_each_iteration=True).repeat()
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(1)
iterator = dataset.make_initializable_iterator()
init_op = iterator.initializer
next_val = iterator.get_next()
with K.get_session().as_default() as sess:
sess.run(init_op, feed_dict={images_tensor: images, onehots_tensor: onehots})
while True:
inputs, labels = sess.run(next_val)
yield inputs, labels
使用
dataset = MongoDBDataset("dbname", "collname")
dataset = dataset.map(_parse_line)
repeat_dataset2 = dataset.repeat()
batch_dataset = repeat_dataset2.batch(20)
iterator = iterator_ops.Iterator.from_structure(batch_dataset.output_types)
#init_op = iterator.make_initializer(dataset)
init_batch_op = iterator.make_initializer(batch_dataset)
get_next = iterator.get_next()
with tf.Session() as sess:
sess.run(init_batch_op, feed_dict={})
for i in range(5):
print(sess.run(get_next))