Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Tensorflow 2.0:&x27;numpy.dtype';对象没有属性';是浮动的';使用GradientTape显式计算和应用渐变时_Tensorflow_Gradient_Tensorflow2.0 - Fatal编程技术网

Tensorflow 2.0:&x27;numpy.dtype';对象没有属性';是浮动的';使用GradientTape显式计算和应用渐变时

Tensorflow 2.0:&x27;numpy.dtype';对象没有属性';是浮动的';使用GradientTape显式计算和应用渐变时,tensorflow,gradient,tensorflow2.0,Tensorflow,Gradient,Tensorflow2.0,对于强化学习,我想明确 计算与输出softmax概率相关的神经网络梯度 通过梯度*行动优势得分更新神经网络权重。(增加行动成功的概率,降低行动失败的概率) 我创建了一个具有简单策略网络的代理: def simple_policy_model(self): inputs = Input(shape=(self.state_size,), name="Input") outputs = Dense(self.action_size, activation='sof

对于强化学习,我想明确

  • 计算与输出softmax概率相关的神经网络梯度
  • 通过梯度*行动优势得分更新神经网络权重。(增加行动成功的概率,降低行动失败的概率)
我创建了一个具有简单策略网络的代理:

def simple_policy_model(self):        
    inputs = Input(shape=(self.state_size,), name="Input")
    outputs = Dense(self.action_size, activation='softmax', name="Output")(inputs)
    predict_model = Model(inputs=[inputs], outputs=[outputs])
    return predict_model
然后我尝试获取渐变:

agent = REINFORCE_Agent(state_size=env.observation_space.shape[0],
                        action_size=env.action_space.n)
print(agent.predict_model.summary())
state_memory = np.random.uniform(size=(3,4))/10
#state_memory = tf.convert_to_tensor(state_memory)
print(state_memory)
print(agent.predict_model.predict(state_memory))

with tf.GradientTape() as tape:
    probs = agent.predict_model.predict(state_memory)
    ### fails below ###
    grads = tape.gradient(probs, agent.predict_model.trainable_weights)
输出:

Model: "model_18"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Input (InputLayer)           (None, 4)                 0         
_________________________________________________________________
Output (Dense)               (None, 2)                 10        
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
None
state_memory [[0.01130021 0.01476066 0.09524527 0.05552276]
 [0.02018996 0.03127809 0.07232339 0.07146596]
 [0.08925738 0.08890574 0.04845396 0.0056015 ]]
prediction [[0.5127161  0.4872839 ]
 [0.5063317  0.49366832]
 [0.4817074  0.51829267]]
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
...
AttributeError: 'numpy.dtype' object has no attribute 'is_floating'
如果我通过取消注释convert_to_tensor将状态_内存转换为张量,它将在.predict()处失败:

看起来很简单,但却陷入了困境,你知道获得梯度的正确方法是什么吗?

问题是

probs=agent.predict\u model.predict(state\u memory)

生成一个numpy张量作为输出。你不能得到梯度
w.r.t
numpy张量。相反,您需要模型中的
tf.Tensor
。为此,请执行以下操作

with tf.GradientTape() as tape:
    probs = agent.predict_model(state_memory)
    ### fails below ###
grads = tape.gradient(probs, agent.predict_model.trainable_weights)
with tf.GradientTape() as tape:
    probs = agent.predict_model(state_memory)
    ### fails below ###
grads = tape.gradient(probs, agent.predict_model.trainable_weights)