Sql 以datasource为数据库构建tensorflow数据集_Sql_Mongodb_Tensorflow

Sql 以datasource为数据库构建tensorflow数据集

sql mongodb tensorflow

Sql 以datasource为数据库构建tensorflow数据集,sql,mongodb,tensorflow,Sql,Mongodb,Tensorflow,我必须用tensorflow tf.data创建一个数据输入管道。数据源是一个mongodb和sql server。如何从数据库创建tf.data对象。我看到的所有文章都有.tfrecords或.csv作为tensorflow的数据源多谢各位。感谢您的输入从数据库中检索数据并将其存储为numpy数组。如果数组太大，无法存储内存，请尝试使用memmap数组然后创建一个生成器，下面是一个来自我自己代码的图像及其onehot编码示例： def tf_augmented_image_generat

我必须用tensorflow tf.data创建一个数据输入管道。数据源是一个mongodb和sql server。如何从数据库创建tf.data对象。我看到的所有文章都有.tfrecords或.csv作为tensorflow的数据源

多谢各位。

感谢您的输入

从数据库中检索数据并将其存储为numpy数组。如果数组太大，无法存储内存，请尝试使用memmap数组

然后创建一个生成器，下面是一个来自我自己代码的图像及其onehot编码示例：

def tf_augmented_image_generator(images,
                                 onehots,
                                 batch_size,
                                 map_fn,
                                 shuffle_size=1000,
                                 num_parallel_calls=tf.data.experimental.AUTOTUNE):
    """
    Create a generator suing a tf.data.Dataframe with augmentation via a map function.
    The generator can then be used for training in model.fit_generator

    The map function must consist of tensorflow operators (not numpy).

    On Windows machines this will lead to faster augmentation, as there are some
    problems performing augmentation in parallel when multiprocessing is enabled in
    in model.fit / model.fit_generator and the default Keras numpy-based augmentated is used,
    e.g. in ImageDataGenerator

    :param images: Images to augment
    :param onehots: Onehot encoding of target class
    :param batch_size: Batch size for training
    :param map_fn: The augmentation map function
    :param shuffle_size: Batch size of images shuffled. Smaller values reduce memory consumption.
    :param num_parallel_calls: Number of calls in parallel, default is automatic tuning.
    :return:
    """
    # Get shapes from input data
    img_size = images.shape
    img_size = (None, img_size[1], img_size[2], img_size[3])
    onehot_size = onehots.shape
    onehot_size = (None, onehot_size[1])
    images_tensor = tf.placeholder(tf.float32, shape=img_size)
    onehots_tensor = tf.placeholder(tf.float32, shape=onehot_size)

    # Create dataset
    dataset = tf.data.Dataset.from_tensor_slices((images_tensor, onehots_tensor))
    if map_fn is not None:
        dataset = dataset.map(lambda x, y: (map_fn(x), y), num_parallel_calls=num_parallel_calls)
    dataset = dataset.shuffle(shuffle_size, reshuffle_each_iteration=True).repeat()
    dataset = dataset.batch(batch_size)
    dataset = dataset.prefetch(1)

    iterator = dataset.make_initializable_iterator()
    init_op = iterator.initializer
    next_val = iterator.get_next()

    with K.get_session().as_default() as sess:
        sess.run(init_op, feed_dict={images_tensor: images, onehots_tensor: onehots})
        while True:
            inputs, labels = sess.run(next_val)
            yield inputs, labels

使用 FiTyGeult< //Cate>

< P> CatExect，C++实现的数据集OP，用于TysOracle，允许您连接到MunGDB。p>

dataset = MongoDBDataset("dbname", "collname")
dataset = dataset.map(_parse_line)
repeat_dataset2 = dataset.repeat()
batch_dataset = repeat_dataset2.batch(20)

iterator = iterator_ops.Iterator.from_structure(batch_dataset.output_types)
#init_op = iterator.make_initializer(dataset)
init_batch_op = iterator.make_initializer(batch_dataset)
get_next = iterator.get_next()

with tf.Session() as sess:
    sess.run(init_batch_op, feed_dict={})

    for i in range(5):
        print(sess.run(get_next))