Python 有没有办法配置RNN的输出形状？_Python_Tensorflow_Recurrent Neural Network

Python 有没有办法配置RNN的输出形状？

python tensorflow

Python 有没有办法配置RNN的输出形状？,python,tensorflow,recurrent-neural-network,Python,Tensorflow,Recurrent Neural Network,我试图创建一个RNN来猜测钢琴上演奏的音符，给定一个钢琴音符的声音文件（WAV格式）。我目前正在将WAV剪辑剪切成10秒的块（2D），用零填充较短的部分到10秒，因此输入都是常规的。但是，当我将剪辑传递给RNN时，它会给出一个更小的维度（1D）的输出（当获取最后一个状态时-我是否应该获取状态序列？）我创建了一个更简单的RNN来分析单个notes文件（2D）并生成一个输出（1D），这是成功的。然而，当尝试将同样的技术应用于包含多个音符和开始/停止音符的完整片段时，它似乎崩溃了，因为我似乎无法更改

我试图创建一个RNN来猜测钢琴上演奏的音符，给定一个钢琴音符的声音文件（WAV格式）。我目前正在将WAV剪辑剪切成10秒的块（2D），用零填充较短的部分到10秒，因此输入都是常规的。但是，当我将剪辑传递给RNN时，它会给出一个更小的维度（1D）的输出（当获取最后一个状态时-我是否应该获取状态序列？）

我创建了一个更简单的RNN来分析单个notes文件（2D）并生成一个输出（1D），这是成功的。然而，当尝试将同样的技术应用于包含多个音符和开始/停止音符的完整片段时，它似乎崩溃了，因为我似乎无法更改输出形状

def weight_variable(shape):
    initer = tf.truncated_normal_initializer(stddev=0.01)
    return tf.get_variable('W', dtype=tf.float32, shape=shape, initializer=initer)

def bias_variable(shape):
    initial = tf.constant(0., shape=shape, dtype=tf.float32)
    return tf.get_variable('b', dtype=tf.float32,initializer=initial)

def RNN(x, weights, biases, timesteps, num_hidden):
    x = tf.unstack(x, timesteps, 1)

    # Define a rnn cell with tensorflow
    lstm_cell = rnn.LSTMCell(num_hidden)
    states_series, current_state = rnn.static_rnn(lstm_cell, x, dtype=tf.float32)
    return tf.matmul(current_state[1], weights) + biases 
    # return [tf.matmul(temp,weights) + biases for temp in states_series]
    # does this even make sense

# x is for data, y is for targets, shapes are [index, time, frequency], [index, time, output note (s)] respectively
x_train, x_valid, y_train, y_valid = load_data() # removed test
print("Size of:")
print("- Training-set:\t\t{}".format(y_train.shape[0]))
print("- Validation-set:\t{}".format(y_valid.shape[0]))
# print("- Test-set\t{}".format(len(y_test)))

learning_rate = 0.001 # The optimization initial learning rate
epochs = 1000         # Total number of training epochs
batch_size = 100      # Training batch size
display_freq = 100    # Frequency of displaying the training results
threshold = 0.7       # Threshold for determining a "note"
num_hidden_units = 15 # Number of hidden units of the RNN

# Placeholders for inputs (x) and outputs(y)
x = tf.placeholder(tf.float32, shape=(None, stepCount, num_input))
y = tf.placeholder(tf.float32, shape=(None, stepCount, n_classes)) 

# create weight matrix initialized randomly from N~(0, 0.01)
W = weight_variable(shape=[num_hidden_units, n_classes])

# create bias vector initialized as zero
b = bias_variable(shape=[n_classes])

output_logits = RNN(x, W, b, stepCount, num_hidden_units)
y_pred = tf.nn.softmax(output_logits)

# Define the loss function, optimizer, and accuracy, etc.
# (code removed, irrelevant)

# Creating the op for initializing all variables
init = tf.global_variables_initializer()

sess = tf.InteractiveSession()
sess.run(init)
global_step = 0
# Number of training iterations in each epoch
num_tr_iter = int(y_train.shape[0] / batch_size)
for epoch in range(epochs):
    print('Training epoch: {}'.format(epoch + 1))
    x_train, y_train = randomize(x_train, y_train)
    for iteration in range(num_tr_iter):
        global_step += 1
        start = iteration * batch_size
        end = (iteration + 1) * batch_size
        x_batch, y_batch = get_next_batch(x_train, y_train, start, end)
        # Run optimization op (backprop)
        feed_dict_batch = {x: x_batch, y: y_batch}
        sess.run(optimizer, feed_dict=feed_dict_batch)

        if iteration % display_freq == 0:
            # Calculate and display the batch loss and accuracy
            loss_batch, acc_batch = sess.run([loss, accuracy],
                                             feed_dict=feed_dict_batch)

            print("iter {0:3d}:\t Loss={1:.2f},\tTraining Accuracy={2:.01%}".
                  format(iteration, loss_batch, acc_batch))
            testLoss.append(loss_batch)
            testAcc.append(acc_batch)

    # Run validation after every epoch

    feed_dict_valid = {x: x_valid[:1000].reshape((-1, stepCount, num_input)), y: y_valid[:1000]}
    loss_valid, acc_valid = sess.run([loss, accuracy], feed_dict=feed_dict_valid)
    print('---------------------------------------------------------')
    print("Epoch: {0}, validation loss: {1:.2f}, validation accuracy: {2:.01%}".
          format(epoch + 1, loss_valid, acc_valid))
    print('---------------------------------------------------------')
    validLoss.append(loss_valid)
    validAcc.append(acc_batch)

目前，这是输出1D预测数组，这在我的场景中确实没有意义，但我不确定如何更改它（它应该输出每个时间步的预测，即每个时刻播放的音符的预测）。

您的问题现在解决了吗。如果它还没有解决，我们可以尝试解决它，如果完整的代码和数据是共享的（只有当它可以共享）。谢谢我解决了，谢谢！