Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python ValueError:试图转换';张量';一个张量,失败了。错误:参数必须是稠密张量:_Python_Tensorflow_Neural Network_Reinforcement Learning - Fatal编程技术网

Python ValueError:试图转换';张量';一个张量,失败了。错误:参数必须是稠密张量:

Python ValueError:试图转换';张量';一个张量,失败了。错误:参数必须是稠密张量:,python,tensorflow,neural-network,reinforcement-learning,Python,Tensorflow,Neural Network,Reinforcement Learning,当我切断线路时 tf.reshape(rewards_list, [-1, 25]) 我听到一个错误说 ValueError: Cannot feed value of shape (1, 1, 25) for Tensor 'Placeholder_3:0', which has shape '(?, 25)' 但是当我把它放在那里时,我在标题中得到了错误信息 ValueError: Tried to convert 'tensor' to a tensor and failed. Err

当我切断线路时

tf.reshape(rewards_list, [-1, 25])
我听到一个错误说

ValueError: Cannot feed value of shape (1, 1, 25) for Tensor 'Placeholder_3:0', which has shape '(?, 25)'
但是当我把它放在那里时,我在标题中得到了错误信息

ValueError: Tried to convert 'tensor' to a tensor and failed. Error: Argument must be a dense tensor: [array([[0.4758947]], dtype=float32)] - got shape [1, 1, 1], but wanted [1].
我不明白发生了什么事。奖励列表怎么可能是这两种形状

observations = tf.placeholder('float32', shape=[None, num_stops]) # Current game states : r[stop], r[next_stop], r[third_stop]
actions = tf.placeholder('int32',shape=[None])  # 0 - num-stops for actions taken
rewards = tf.placeholder('float32',shape=[None])  # +1, -1 with discounts

# Model
Y = tf.layers.dense(observations, 200, activation=tf.nn.relu)
Ylogits = tf.layers.dense(Y, num_stops)

# sample an action from predicted probabilities
sample_op = tf.random.categorical(logits=Ylogits, num_samples=1)


# loss
cross_entropies = tf.losses.softmax_cross_entropy(onehot_labels=tf.one_hot(actions,num_stops), logits=Ylogits)

loss = tf.reduce_sum(rewards * cross_entropies)

# training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=.99)
train_op = optimizer.minimize(loss)

visited_stops = []
steps = 0

with tf.Session() as sess:

    sess.run(tf.global_variables_initializer())

    # Start at a random stop, initialize done to false
    current_stop = random.randint(0, len(r) - 1)
    done = False

    # reset everything    
    while not done: # play a game in x steps   

        observations_list = []
        actions_list = []
        rewards_list = []

        # List all stops and their scores
        observation = r[current_stop]

        # Add the stop to a list of non-visited stops if it isn't
        # already there
        if current_stop not in visited_stops:
            visited_stops.append(current_stop)

        # decide where to go
        action = sess.run(sample_op, feed_dict={observations: [observation]})

        # play it, output next state, reward if we got a point, and whether the game is over
        #game_state, reward, done, info = pong_sim.step(action)
        new_stop = int(action)


        reward = r[current_stop][action]

        if len(visited_stops) == num_stops:
            done = True

        if steps >= BATCH_SIZE:
            done = True

        steps += 1

        observations_list.append(observation)
        actions_list.append(action)
        rewards_list.append(reward)







        #rewards_list = np.reshape(rewards, [-1, 25])
        current_stop = new_stop

    #processed_rewards = discount_rewards(rewards, args.gamma)
    #processed_rewards = normalize_rewards(rewards, args.gamma)

    tf.reshape(rewards_list, [-1, 25])

    sess.run(train_op, feed_dict={observations: [observations_list],
                                 actions: [actions_list],
                                 rewards: rewards_list})

请张贴整个图表,而不仅仅是它的执行让我知道,如果有什么我可以做的。谢谢您的
奖励值是多少?大小为25的向量?什么是
np.数组(奖励列表).shape
打印?奖励值访问大小为[25,25]的矩阵。这是一个25站的距离矩阵。奖励是距离的1/2。