Python 无扫描Op梯度中的断开输入_Python_Theano

Python 无扫描Op梯度中的断开输入

python

Python 无扫描Op梯度中的断开输入,python,theano,Python,Theano,我有许多不同大小的物品。对于这些组中的每一组，一个（已知）项是“正确”项。有一个函数将为每个项目分配分数。这会产生一个项目分数的平面向量，以及告诉索引每个组从哪里开始以及它有多大的向量。我希望对每组的分数进行“softmax”运算，以分配项目概率，然后获取正确答案概率的日志总和。这是一个更简单的版本，我们只返回正确答案的分数，不带softmax和对数 import numpy

我有许多不同大小的物品。对于这些组中的每一组，一个（已知）项是“正确”项。有一个函数将为每个项目分配分数。这会产生一个项目分数的平面向量，以及告诉索引每个组从哪里开始以及它有多大的向量。我希望对每组的分数进行“softmax”运算，以分配项目概率，然后获取正确答案概率的日志总和。这是一个更简单的版本，我们只返回正确答案的分数，不带softmax和对数

import numpy                                                                                                                                                                                                                                                                          
import theano                                                                                                                                                                                                                                                                         
import theano.tensor as T                                                                                                                                                                                                                                                             
from theano.printing import Print                                                                                                                                                                                                                                                     

def scoreForCorrectAnswer(groupSize, offset, correctAnswer, preds):  
    # for each group, this will get called with the size of
    # the group, the offset of where the group begins in the 
    # predictions vector, and which item in that group is correct                                                                                                                                                                                                                                                                                                                                                                                                                                              
    relevantPredictions = preds[offset:offset+groupSize]                                                                                                                                                                                                                              
    ans = Print("CorrectAnswer")(correctAnswer)                                                                                                                                                                                                                                       
    return relevantPredictions[ans]       

groupSizes = T.ivector('groupSizes')                                                                                                                                                                                                                                                  
offsets = T.ivector('offsets')                                                                                                                                                                                                                                                        
x = T.fvector('x')                                                                                                                                                                                                                                                                    
W = T.vector('W')                                                                                                                                                                                                                                                                     
correctAnswers = T.ivector('correctAnswers')                                                                                                                                                                                                                                          

# for this simple example, we'll just score the items by
# element-wise product with a weight vector                                                                                                                                                                                                                                                                                  
predictions = x * W                                                                                                                                                                                                                                                                   

(values, updates) = theano.map(fn=scoreForCorrectAnswer,                                                                                                                                                                                                                                       
   sequences = [groupSizes, offsets, correctAnswers],                                                                                                                                                                                                                                
   non_sequences = [predictions] )                                                                                                                                                                                                                                                    

func = theano.function([groupSizes, offsets, correctAnswers,                                                                                                                                                                                                                          
        W, x], [values])                                                                                                                                                                                                                                                              

sampleInput = numpy.array([0.1,0.7,0.3,0.05,0.3,0.3,0.3], dtype='float32')                                                                                                                                                                                                            
sampleW = numpy.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], dtype='float32')                                                                                                                                                                                                           
sampleOffsets = numpy.array([0,4], dtype='int32')                                                                                                                                                                                                                                     
sampleGroupSizes = numpy.array([4,3], dtype='int32')                                                                                                                                                                                                                                  
sampleCorrectAnswers = numpy.array([1,2], dtype='int32')                                                                                                                                                                                                                              

data = func (sampleGroupSizes, sampleOffsets, sampleCorrectAnswers, sampleW, sampleInput)                                                                                                                                                                                             
print data                                                                                                                                                                                                                                                                            

#these all three raise the same exception (see below)                                                                                                                                                                                                                                             
gW1 = T.grad(cost=T.sum(values), wrt=W)                                                                                                                                                                                                                                               
gW2 = T.grad(cost=T.sum(values), wrt=W, disconnected_inputs='warn')                                                                                                                                                                                                                   
gW3 = T.grad(cost=T.sum(values), wrt=W, consider_constant=[groupSizes,offsets])

这正确地计算了输出，但当我尝试获取与参数

相关的梯度时，我得到（路径缩写）：

回溯（最近一次呼叫最后一次）：
文件“test\u scan\u for\u stackoverflow.py”，第37行，在
gW=T.grad（成本=T.sum（值），wrt=W）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第438行，渐变
输出，wrt，考虑_常数）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第698行，在\u populate\u var\u to\u app\u to\u idx中
账户（输出）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第694行，考虑到
会计单位（ipt）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第669行，考虑到
连接\模式=\节点\到\模式（应用程序）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第554行，在节点到节点模式中
连接模式=节点。操作。连接模式（节点）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/scan_module/scan_op.py”，第1331行，连接模式
ils）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/scan_module/scan_op.py”，第1266行，计算梯度
已知梯度={y:g_y}，wrt=x）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第511行，渐变
断开手柄（elem）
文件“Theano-0.6.0rc2-py2.7.egg/Theano/gradient.py”，第497行，在句柄中
引发DisconnectedInputError（消息）
theano.gradient.DisconnectedInputError：要求grad方法进行计算
相对于不属于该变量的变量的梯度
成本的计算图，或仅由
不可微算子：groupsize[t]

现在，

groupsize

是常量，因此没有理由需要对其进行任何渐变。通常，您可以通过抑制

DisconnectedInputError

s或告诉Theano在

T.grad

调用中将

groupsize

视为常量来处理此问题（请参见示例脚本的最后几行）。但是似乎没有任何方法可以将这些东西传递给

ScanOp

的梯度计算中的内部

t.grad

调用

我错过什么了吗？这是一种通过ScanOp进行梯度计算的方法吗？

截至2013年2月中旬（0.6.0rc-2），这是一个Theano错误。自本文发布之日起，它已在github上的开发版本中修复

Traceback (most recent call last):
  File "test_scan_for_stackoverflow.py", line 37, in <module>
    gW = T.grad(cost=T.sum(values), wrt=W)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 438, in grad
    outputs, wrt, consider_constant)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 698, in _populate_var_to_app_to_idx
    account_for(output)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 694, in account_for
    account_for(ipt)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 669, in account_for
    connection_pattern = _node_to_pattern(app)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 554, in _node_to_pattern
    connection_pattern = node.op.connection_pattern(node)
  File "Theano-0.6.0rc2-py2.7.egg/theano/scan_module/scan_op.py", line 1331, in connection_pattern
ils)
  File "Theano-0.6.0rc2-py2.7.egg/theano/scan_module/scan_op.py", line 1266, in compute_gradient
    known_grads={y: g_y}, wrt=x)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 511, in grad
    handle_disconnected(elem)
  File "Theano-0.6.0rc2-py2.7.egg/theano/gradient.py", line 497, in handle_disconnected
    raise DisconnectedInputError(message)
theano.gradient.DisconnectedInputError: grad method was asked to compute 
the gradient with respect to a variable that is not part of the 
computational graph of the cost, or is used only by a 
non-differentiable operator: groupSizes[t]