Python 为什么theano.tensor.stack()使函数编译时间快得多?
我正在写一个函数,它接受一系列酶和底物浓度,并返回一系列最大反应速率Python 为什么theano.tensor.stack()使函数编译时间快得多?,python,theano,Python,Theano,我正在写一个函数,它接受一系列酶和底物浓度,并返回一系列最大反应速率 import theano.tensor as T from theano import function self.rateFunction = function(enzyme_vars_array + substrate_vars_array, rateExpressionsArray, on_unused_input='ignore') 酶变量数组和底物变量数组是NOR张量的列表,使用T.dscalar(“事物的名称
import theano.tensor as T
from theano import function
self.rateFunction = function(enzyme_vars_array + substrate_vars_array, rateExpressionsArray, on_unused_input='ignore')
酶变量数组和底物变量数组是NOR张量的列表,使用T.dscalar(“事物的名称”)构建
RateExpressionArray是一个python表达式列表,它根据中的theano张量给出反应的动力学速率
酶变量数组和底物变量数组。因为某些酶和底物浓度可能不被使用,所以我加入了on_unused_input='ignore'标志
这些输入的大小为:
len(酶变量数组)=132
len(基板阵列)=17
len(速率表达式数组)=2402
以编写的方式运行这行代码会导致编译非常长,至少需要5分钟(我最终放弃了等待)
但是,这一更改将编译时间缩短到了几秒钟:
self.rateFunction = function(enzyme_vars_array + substrate_vars_array, T.stack(rateExpressionsArray), on_unused_input='ignore')
theano.tensor.stack()做了什么导致编译时间发生如此巨大的变化?那么,最初的配方又是如何让它变得如此缓慢的呢
编辑:
在编写可以运行的示例代码时,这个问题变得更加奇怪。此代码生成酶变量数组、底物变量数组和rateExpressionArray的示例:
import theano.tensor as T
from theano import function
import numpy as np
enzyme_names = ['e1','e2','e3','e4','e5','e6','e7','e8','e9','e10','e11','e12','e13','e14','e15','e16','e17','e18','e19','e20', '21','e22','e23','e24','e25','e26','e27','e28','e29','e30','e31','e32','e33','e34','e35','e36','e37','e38','e39','e40','e41','e42','e43','e44','e45','e46','e47','e48','e49','e50','e61','e62','e63','e64','e65','e66','e67','e68','e69','e70','e71','e72','e73','e74','e75','e76','e77','e78','e79','e80','e81','e82','e83','e84','e85','e86','e87','e88','e89','e90','e91','e92','e93','e94','e95','e96','e97','e98','e99','e100','e101','e102','e103','e104','e105','e106','e107','e108','e109','e110','e111','e112','e113','e114','e115','e116','e117','e118','e119','e120','e121','e122','e123','e124','e125','e126','e127','e128','e129','e130','e131','e132']
substrate_names = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12','s13','s14','s15','s16','s17']
enzyme_vars_array = T.dscalars(*enzyme_names)
substrate_vars_array = T.dscalars(*substrate_names)
rateExpressionsArray = [0]*2292
# Populate rateExpressionsArray with random expressions in terms of the theano variables
for index, rateExpression in enumerate(rateExpressionsArray):
random_enzyme_index = np.random.randint(len(enzyme_vars_array))
random_substrate_index = np.random.randint(len(substrate_vars_array))
rateExpressionsArray[index] = enzyme_vars_array[random_enzyme_index] * substrate_vars_array[random_substrate_index]
rateFunction = function(enzyme_vars_array + substrate_vars_array, rateExpressionsArray, on_unused_input='ignore')
这是我希望缓慢运行的代码(因为它在没有T.stack的情况下运行)。然而,在本例中,它现在以完全合理的速度运行。但是,当我将最后一行替换为以下内容时:
rateFunction = function(enzyme_vars_array + substrate_vars_array, T.stack(rateExpressionsArray), on_unused_input='ignore')
它再次运行得非常慢,与我在实际代码中观察到的行为相反
我考虑了这个示例与实际代码之间可能存在的差异,并意识到我的real RateExpressionArray非常稀疏,它主要由一个占位符组成,该占位符只返回一个默认值,只有几个实际表达式。因此,重写我的示例代码,使其看起来更像这样,我有:
import theano.tensor as T
from theano import function
import numpy as np
enzyme_names = ['e1','e2','e3','e4','e5','e6','e7','e8','e9','e10','e11','e12','e13','e14','e15','e16','e17','e18','e19','e20', '21','e22','e23','e24','e25','e26','e27','e28','e29','e30','e31','e32','e33','e34','e35','e36','e37','e38','e39','e40','e41','e42','e43','e44','e45','e46','e47','e48','e49','e50','e61','e62','e63','e64','e65','e66','e67','e68','e69','e70','e71','e72','e73','e74','e75','e76','e77','e78','e79','e80','e81','e82','e83','e84','e85','e86','e87','e88','e89','e90','e91','e92','e93','e94','e95','e96','e97','e98','e99','e100','e101','e102','e103','e104','e105','e106','e107','e108','e109','e110','e111','e112','e113','e114','e115','e116','e117','e118','e119','e120','e121','e122','e123','e124','e125','e126','e127','e128','e129','e130','e131','e132']
substrate_names = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12','s13','s14','s15','s16','s17']
enzyme_vars_array = T.dscalars(*enzyme_names)
substrate_vars_array = T.dscalars(*substrate_names)
# This is the placeholder expression which returns an input unmodified
noRate = T.dscalar('noRate')
rateExpressionsArray = [noRate]*2292
# Include only a handful of nontrivial expressions
for index in xrange(1,50):
random_enzyme_index = np.random.randint(len(enzyme_vars_array))
random_substrate_index = np.random.randint(len(substrate_vars_array))
rateExpressionsArray[index] = enzyme_vars_array[random_enzyme_index] * substrate_vars_array[random_substrate_index]
rateFunction = function(enzyme_vars_array + substrate_vars_array + [noRate], T.stack(rateExpressionsArray), on_unused_input='ignore')
这产生了原始行为—当rateExpressionArray中只有少数索引是非平凡表达式时,代码使用T.stack()运行得很快,而不使用T.stack()运行得很慢。要复制慢速版本,请将最后一行替换为:
rateFunction = function(enzyme_vars_array + substrate_vars_array + [noRate], rateExpressionsArray, on_unused_input='ignore')
看来T.stack在稀疏输入和密集输入上产生了不同的行为?现在我对stack()的函数和机制更加困惑。如果您能了解这里发生的一切,我们将不胜感激 你能分享更多你的代码吗?或者,也许更好的是,您能找到一个更简单的示例,可以复制、粘贴和运行,而无需依赖项,以演示该行为吗?您能分享更多的代码吗?或者,也许更好的是,您能找到一个更简单的示例来演示该行为,该示例可以在没有依赖项的情况下复制、粘贴和运行?