Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/283.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从tf.distributions.category输出层创建softmax_Python_Tensorflow_Machine Learning_Neural Network_Softmax - Fatal编程技术网

Python 从tf.distributions.category输出层创建softmax

Python 从tf.distributions.category输出层创建softmax,python,tensorflow,machine-learning,neural-network,softmax,Python,Tensorflow,Machine Learning,Neural Network,Softmax,我正在训练一个代理在离散环境中执行操作,我正在使用tf.distributions.Categorical输出层,然后对该层进行采样以创建softmax输出,以确定要采取的操作。我创建的策略网络如下所示: pi_eval, _ = self._build_anet(self.state, 'pi', reuse=True) def _build_anet(self, state_in, name, reuse=False): w_reg = tf.contrib.layers.l2_r

我正在训练一个代理在离散环境中执行操作,我正在使用
tf.distributions.Categorical
输出层,然后对该层进行采样以创建softmax输出,以确定要采取的操作。我创建的策略网络如下所示:

pi_eval, _ = self._build_anet(self.state, 'pi', reuse=True)

def _build_anet(self, state_in, name, reuse=False):
    w_reg = tf.contrib.layers.l2_regularizer(L2_REG)
    with tf.variable_scope(name, reuse=reuse):
        layer_1 = tf.layers.dense(state_in, HIDDEN_LAYER_NEURONS, tf.nn.relu, kernel_regularizer=w_reg, name="pi_l1")
        layer_2 = tf.layers.dense(layer_1, HIDDEN_LAYER_NEURONS, tf.nn.relu, kernel_regularizer=w_reg, name="pi_l2")
        a_logits = tf.layers.dense(layer_2, self.a_dim, kernel_regularizer=w_reg, name="pi_logits")
        dist = tf.distributions.Categorical(logits=a_logits)
    params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=name)
    return dist, params
softmax = self.sess.run([self.logits_action], {self.state: state[np.newaxis, :]})
然后,我使用以下示例对网络进行采样,并建立一个类分发输出作为softmax输出:

像这样跑:

pi_eval, _ = self._build_anet(self.state, 'pi', reuse=True)

def _build_anet(self, state_in, name, reuse=False):
    w_reg = tf.contrib.layers.l2_regularizer(L2_REG)
    with tf.variable_scope(name, reuse=reuse):
        layer_1 = tf.layers.dense(state_in, HIDDEN_LAYER_NEURONS, tf.nn.relu, kernel_regularizer=w_reg, name="pi_l1")
        layer_2 = tf.layers.dense(layer_1, HIDDEN_LAYER_NEURONS, tf.nn.relu, kernel_regularizer=w_reg, name="pi_l2")
        a_logits = tf.layers.dense(layer_2, self.a_dim, kernel_regularizer=w_reg, name="pi_logits")
        dist = tf.distributions.Categorical(logits=a_logits)
    params = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=name)
    return dist, params
softmax = self.sess.run([self.logits_action], {self.state: state[np.newaxis, :]})
但输出只有两个非零项:

[0.44329998 0.         0.         0.5567    ]
[0.92139995 0.         0.         0.0786    ]
[0.95699996 0.         0.         0.043     ]
[0.7051 0.     0.     0.2949]
我的直觉与
value\u range
有关,上面说:

值\范围:与值具有相同数据类型的形状张量。值=值\范围 将映射到hist[-1]


但我不确定应该使用什么值范围?我想知道是否有人有什么想法?

事实上,我怀疑这与
值范围有关,我应该将上限设置为动作维度:

value_range=[0, self.a_dim]

事实上,正如我所怀疑的,这与
值\u范围有关,我应该将上限设置为action维度:

value_range=[0, self.a_dim]