Python 不确定性软件包的计算时间出人意料地长

Python 不确定性软件包的计算时间出人意料地长,python,performance,uncertainty,Python,Performance,Uncertainty,考虑以下代码片段: import random from uncertainties import unumpy, ufloat x = [random.uniform(0,1) for p in range(1,8200)] y = [random.randrange(0,1000) for p in range(1,8200)] xerr = [random.uniform(0,1)/1000 for p in range(1,8200)] yerr = [random.uniform(0

考虑以下代码片段:

import random
from uncertainties import unumpy, ufloat

x = [random.uniform(0,1) for p in range(1,8200)]
y = [random.randrange(0,1000) for p in range(1,8200)]
xerr = [random.uniform(0,1)/1000 for p in range(1,8200)]
yerr = [random.uniform(0,1)*10 for p in range(1,8200)]

x = unumpy.uarray(x, xerr)
y = unumpy.uarray(y, yerr)
diff = sum(x*y)
u = ufloat(0.0, 0.0)
for k in range(len(x)):
    u+= (diff-x[k])**2 * y[k]  

print(u)
如果我试着在我的电脑上运行它,产生结果需要10分钟。我真的不知道为什么会这样,希望你能解释一下。 如果我不得不猜测的话,我会说,由于某种原因,不确定性的计算比人们想象的要复杂,但正如我所说,这只是一个猜测。有趣的是,如果在最后删除
print
指令,代码几乎立刻就完成了,这让我感到困惑,而不是帮助


如果你不知道,这是图书馆的回购协议。

我可以复制这一点,印刷品是永远需要的。或者更确切地说,它是 转换为打印隐式调用的字符串。 我曾经测量
仿射calarfunc
\uuu格式\uuu
函数的时间。(由
\uuuu str\uuuu
调用,通过打印调用) 我将阵列大小从8200减小到1000,使其运行得更快。这是结果(为了可读性而删减):

您可以看到,几乎所有的时间都是在1967行中进行的,其中计算了标准偏差。如果你再深入一点,你会发现
error\u components
属性是问题所在,而
derivatives
属性是问题所在,其中
\u linear\u part.expand()
是问题所在。如果你分析了这一点,你就开始找到问题的根源。这里的大多数工作分布均匀:

Function: expand at line 1481

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1481                                               @profile
  1482                                               def expand(self):
  1483                                                   """
  1484                                                   Expand the linear combination.
  1485                                           
  1486                                                   The expansion is a collections.defaultdict(float).
  1487                                           
  1488                                                   This should only be called if the linear combination is not
  1489                                                   yet expanded.
  1490                                                   """
  1491                                           
  1492                                                   # The derivatives are built progressively by expanding each
  1493                                                   # term of the linear combination until there is no linear
  1494                                                   # combination to be expanded.
  1495                                           
  1496                                                   # Final derivatives, constructed progressively:
  1497         1          2.0      2.0      0.0          derivatives = collections.defaultdict(float)
  1498                                           
  1499  15995999    4942237.0      0.3      9.7          while self.linear_combo:  # The list of terms is emptied progressively
  1500                                           
  1501                                                       # One of the terms is expanded or, if no expansion is
  1502                                                       # needed, simply added to the existing derivatives.
  1503                                                       #
  1504                                                       # Optimization note: since Python's operations are
  1505                                                       # left-associative, a long sum of Variables can be built
  1506                                                       # such that the last term is essentially a Variable (and
  1507                                                       # not a NestedLinearCombination): popping from the
  1508                                                       # remaining terms allows this term to be quickly put in
  1509                                                       # the final result, which limits the number of terms
  1510                                                       # remaining (and whose size can temporarily grow):
  1511  15995998    6235033.0      0.4     12.2              (main_factor, main_expr) = self.linear_combo.pop()
  1512                                           
  1513                                                       # print "MAINS", main_factor, main_expr
  1514                                           
  1515  15995998   10572206.0      0.7     20.8              if main_expr.expanded():
  1516  15992002    6822093.0      0.4     13.4                  for (var, factor) in main_expr.linear_combo.items():
  1517   7996001    8070250.0      1.0     15.8                      derivatives[var] += main_factor*factor
  1518                                           
  1519                                                       else:  # Non-expanded form
  1520  23995993    8084949.0      0.3     15.9                  for (factor, expr) in main_expr.linear_combo:
  1521                                                               # The main_factor is applied to expr:
  1522  15995996    6208091.0      0.4     12.2                      self.linear_combo.append((main_factor*factor, expr))
  1523                                           
  1524                                                       # print "DERIV", derivatives
  1525                                           
  1526         1          2.0      2.0      0.0          self.linear_combo = derivatives
您可以看到,
expanded
有大量的调用,这些调用调用了
isinstance
。 还要注意注释,它暗示这个库实际上只在需要时计算导数(并且知道它在其他方面非常慢)。这就是为什么转换为字符串需要如此长的时间,而且之前没有花费时间

在AffineScalarFunc的
初始化中:

# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).
#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).
在AffineScalarFunc的
std\u dev
中:

# In order to have a linear execution time for long sums, the
# _linear_part is generally left as is (otherwise, each
# successive term would expand to a linearly growing sum of
# terms: efficiently handling such terms [so, without copies]
# is not obvious, when the algorithm should work for all
# functions beyond sums).
#! It would be possible to not allow the user to update the
#std dev of Variable objects, in which case AffineScalarFunc
#objects could have a pre-calculated or, better, cached
#std_dev value (in fact, many intermediate AffineScalarFunc do
#not need to have their std_dev calculated: only the final
#AffineScalarFunc returned to the user does).
线性组合的
中展开

   # The derivatives are built progressively by expanding each
    # term of the linear combination until there is no linear
    # combination to be expanded.

总而言之,这在某种程度上是意料之中的,因为库处理这些非本机数字,这些数字需要(显然)进行大量操作。

不确定什么是
不确定性
,以及
unumpy
等做什么。但是如果是一个很长的列表,那么在
uarray(x,xerr)
长度上循环的for循环可能需要一段时间。
x
的长度是多少?你计时了吗?看看哪个部分在花时间?在这个例子中是8200。对我来说,这似乎不是一个非常长的数组,或者至少我不希望这样一个列表上的基本操作花费这么长的时间…@Torxed直到现在才看到你的第二个问题。。。是的,我用
tqdm
对for循环计时,它几乎立即达到100%的完成率,但就是没有完成。。实际上,需要时间的是
打印(u)
。我检查了一下,循环的每次迭代大约需要
1.1444091796875e-05
,因此整个循环大约需要0.182秒。需要时间的是仿射CalarFunc(
u
)的打印。不确定打印(联合国)是什么意思,但这是一个相当大的数字
1.7427233520528605e+19
。所以我猜里面的数字比你想象的要多?@通过设置
y=[random.randrange(0,1)表示范围内的p(18200)]
yerr=[random.uniform(0,1)表示范围内的p(18200)]
这个过程肯定会加快一点(因为结果是数字~1),但最终还是需要相当长的时间。。。但是说实话,不管u有多大,我还是很困惑为什么要花这么多时间才能打印出来……非常感谢你的解释!知道这个问题只存在于标准偏差的打印中当然是有用的。你有没有发现一种处理这个问题的方法?我对这个图书馆不太熟悉,但现在你似乎只需要处理它。在某些时候,您只需要计算最终值,无论是在打印期间还是之前。作者有一些内置的缓存,所以我想有一些努力来提高速度。