Tensorflow 在tfrecord文件中写入和读取SparSetSensor_Tensorflow_Sparse Matrix_Tfrecord

Tensorflow 在tfrecord文件中写入和读取SparSetSensor

tensorflow

Tensorflow 在tfrecord文件中写入和读取SparSetSensor,tensorflow,sparse-matrix,tfrecord,Tensorflow,Sparse Matrix,Tfrecord,有可能做到优雅吗现在我唯一能想到的就是将SparsetSensor的索引（tf.int64）、值（tf.float32）和形状（tf.int64）保存在3个单独的特性中（前两个是VarLenFeature，最后一个是FixedLenFeature）。这看起来真的很麻烦任何建议都将不胜感激更新1 我下面的答案不适合构建计算图（b/c稀疏张量的内容必须通过sess.run（）提取，如果重复调用，会花费大量时间。）受此启发，我想也许我们可以得到由tf.serialize\u sparse生成的

有可能做到优雅吗

现在我唯一能想到的就是将SparsetSensor的索引（tf.int64）、值（tf.float32）和形状（tf.int64）保存在3个单独的特性中（前两个是VarLenFeature，最后一个是FixedLenFeature）。这看起来真的很麻烦

任何建议都将不胜感激

更新1 我下面的答案不适合构建计算图（b/c稀疏张量的内容必须通过sess.run（）提取，如果重复调用，会花费大量时间。）

受此启发，我想也许我们可以得到由

tf.serialize\u sparse

生成的字节，以便稍后我们可以使用

tf.deserialize\u many\u sparse

恢复SparseTensor。但是

tf.serialize\u sparse

不是在纯python中实现的（它调用外部函数

SerializeSparse

），这意味着我们仍然需要使用

sess.run（）

来获取字节。我如何获得纯python版本的

和？谢谢。
由于Tensorflow目前只支持tfrecord中的3种类型：Float、Int64和Bytes，并且SparsetSensor通常有多个类型，因此我的解决方案是使用Pickle
将SparseTensor转换为字节
下面是一个示例代码：
import tensorflow as tf
import pickle
import numpy as np
from scipy.sparse import csr_matrix

#---------------------------------#
# Write to a tfrecord file

# create two sparse matrices (simulate the values from .eval() of SparseTensor)
a = csr_matrix(np.arange(12).reshape((4,3)))
b = csr_matrix(np.random.rand(20).reshape((5,4)))

# convert them to pickle bytes
p_a = pickle.dumps(a)
p_b = pickle.dumps(b)

# put the bytes in context_list and feature_list
## save p_a in context_lists 
context_lists = tf.train.Features(feature={
    'context_a': tf.train.Feature(bytes_list=tf.train.BytesList(value=[p_a]))
    })
## save p_b as a one element sequence in feature_lists
p_b_features = [tf.train.Feature(bytes_list=tf.train.BytesList(value=[p_b]))]
feature_lists = tf.train.FeatureLists(feature_list={
    'features_b': tf.train.FeatureList(feature=p_b_features)
    })

# create the SequenceExample
SeqEx = tf.train.SequenceExample(
    context = context_lists,
    feature_lists = feature_lists
    )
SeqEx_serialized = SeqEx.SerializeToString()

# write to a tfrecord file
tf_FWN = 'test_pickle1.tfrecord'
tf_writer1 = tf.python_io.TFRecordWriter(tf_FWN)
tf_writer1.write(SeqEx_serialized)
tf_writer1.close()

#---------------------------------#
# Read from the tfrecord file

# first, define the parse function
def _parse_SE_test_pickle1(in_example_proto):
    context_features = {
        'context_a': tf.FixedLenFeature([], dtype=tf.string)
        }
    sequence_features = {
        'features_b': tf.FixedLenSequenceFeature([1], dtype=tf.string)
        }
    context, sequence = tf.parse_single_sequence_example(
      in_example_proto, 
      context_features=context_features,
      sequence_features=sequence_features
      )
    p_a_tf = context['context_a']
    p_b_tf = sequence['features_b']

    return tf.tuple([p_a_tf, p_b_tf])

# use the Dataset API to read
dataset = tf.data.TFRecordDataset(tf_FWN)
dataset = dataset.map(_parse_SE_test_pickle1)
dataset = dataset.batch(1)
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
sess.run(iterator.initializer)

[p_a_bat, p_b_bat] = sess.run(next_element)

# 1st index refers to batch, 2nd and 3rd indices refers to the sequence position (only for b)
rec_a = pickle.loads(p_a_bat[0])
rec_b = pickle.loads(p_b_bat[0][0][0])

# check whether the recovered the same as the original ones.
assert((rec_a - a).nnz == 0)
assert((rec_b - b).nnz == 0)

# print the contents
print("\n------ a -------")
print(a.todense())
print("\n------ rec_a -------")
print(rec_a.todense())
print("\n------ b -------")
print(b.todense())
print("\n------ rec_b -------")
print(rec_b.todense())

以下是我得到的：
------ a -------
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

------ rec_a -------
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

------ b -------
[[ 0.88612402  0.51438017  0.20077887  0.20969243]
 [ 0.41762425  0.47394715  0.35596051  0.96074408]
 [ 0.35491739  0.0761953   0.86217511  0.45796474]
 [ 0.81253723  0.57032448  0.94959189  0.10139615]
 [ 0.92177499  0.83519464  0.96679833  0.41397829]]

------ rec_b -------
[[ 0.88612402  0.51438017  0.20077887  0.20969243]
 [ 0.41762425  0.47394715  0.35596051  0.96074408]
 [ 0.35491739  0.0761953   0.86217511  0.45796474]
 [ 0.81253723  0.57032448  0.94959189  0.10139615]
 [ 0.92177499  0.83519464  0.96679833  0.41397829]]

我遇到了在TFRecord文件中写入和读取稀疏张量的问题，我在网上发现了很少的相关信息
正如您所建议的，一种解决方案是将SparsetSensor的索引、值和形状存储在3个单独的功能中，我们将对此进行讨论。这看起来既不高效也不优雅
我有一个工作示例（使用tensorflow2.0.0.alpha0）。
也许不是最优雅的，但似乎很管用
将tensorflow导入为tf
将numpy作为np导入
#示例数据
st_1=tf.SparseTensor（指数=[[0,0]，[1,2]]，值=[1,2]，密集形状=[3,4]）
st_2=tf.SparseTensor（指数=[[0,1]，[2,0]，[3,3]，[3,9,5]，密集_形状=[4,4]）
稀疏张量=[st_1，st_2]
#将稀疏张量序列化为字节字符串数组
序列化的稀疏张量=[tf.io.serialize\u sparse（st）.numpy（）表示稀疏张量中的st]
#写入TFR记录
使用tf.io.TFRecordWriter（'sparse_example.tfrecord'）作为tfwriter：
对于序列化_稀疏_张量中的sst：
稀疏示例=tf.train.example（特征=
火车特征=
{'u_张量'：
tf.train.Feature（字节列表=tf.train.BytesList（值=sst））
}))
#将每个示例追加到tfrecord中
tfwriter.write（稀疏\u示例.SerializeToString（））
def parse_fn（数据元素）：
features={'sparse_tensor'：tf.io.FixedLenFeature（[3]，tf.string）}
parsed=tf.io.parse_单个_示例（data_元素，features=features）
#反序列化\u many\u sparse（）要求维度为[N，3]，因此我们添加了一个带有expand\u dims的维度
已解析['sparse_tensor']=tf.展开（已解析['sparse_tensor']，轴=0）
#反序列化稀疏张量
解析['sparse\u tensor']=tf.io.反序列化\u many\u sparse（解析['sparse\u tensor']，dtype=tf.int32）
#从稀疏转换为稠密
已解析['sparse\u tensor']=tf.sparse.to\u dense（已解析['sparse\u tensor']）
#移除额外的维度[1,3]->[3]
解析的['sparse\u tensor']=tf.squence（解析的['sparse\u tensor']）
返回解析
#从TFR记录中读取
dataset=tf.data.TFRecordDataset（['sparse\u example.tfrecord']））
dataset=dataset.map（parse_fn）
#Pad和batch数据集
dataset=dataset.padded_批处理（2，padded_形状={'sparse_tensor'：[None，None]}）
数据集.\uuuu iter\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu

这将产生：
{'sparse_tensor'：}
是否可以将轴=0
添加到tf.squence（）中，以确保只删除展开的尺寸？