List Mathematica中的Prepend与Append perf_List_Wolfram Mathematica_Lisp

List Mathematica中的Prepend与Append perf

list wolfram-mathematica lisp

List Mathematica中的Prepend与Append perf,list,wolfram-mathematica,lisp,List,Wolfram Mathematica,Lisp,在类似Lisp的系统中，cons是将元素前置到列表中的常规方法。附加到列表的函数要昂贵得多，因为它们将列表遍历到底，然后用对附加项的引用替换最后的null。IOW（伪LISP）问题是Mathemtica的情况是否类似？在大多数情况下，Mathematica的列表似乎与lisp的列表一样是单独链接的，如果是这样的话，我们可以假设Append[list，item]比Prepend[list，item]昂贵得多。然而，我在Mathematica文档中找不到任何东西来解决这个问题。如果Mathemat

在类似Lisp的系统中，cons是将元素前置到列表中的常规方法。附加到列表的函数要昂贵得多，因为它们将列表遍历到底，然后用对附加项的引用替换最后的null。IOW（伪LISP）

问题是Mathemtica的情况是否类似？在大多数情况下，Mathematica的列表似乎与lisp的列表一样是单独链接的，如果是这样的话，我们可以假设Append[list，item]比Prepend[list，item]昂贵得多。然而，我在Mathematica文档中找不到任何东西来解决这个问题。如果Mathematica的列表是双重链接的，或者实现得更巧妙，比如说，在堆中，或者只是维护一个指向last的指针，那么插入可能具有完全不同的性能

如果您有任何建议或经验，我们将不胜感激。

Mathematica的列表不是像Common Lisp那样的单独链接列表。最好将mathematica列表视为数组或类似向量的结构。插入的速度是O（n），但检索的速度是恒定的

查看这篇文章，其中包括mathematica列表的更多细节

此外，请查看链表上的堆栈溢出问题及其在mathematica中的性能。

作为一个小的附加组件，这里是M中“AppendTo”的有效替代方法-

如前所述，由于Mathematica列表是作为数组实现的，因此像Append和Prepend这样的操作会导致每次添加元素时都复制列表。一个更有效的方法是预先分配一个列表并填充它，但是我下面的实验并没有显示出我预期的那么大的差异。显然，更好的方法是链表法，我必须对此进行调查

Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   appendlist = startlist;
   appendtime = 
    First[AbsoluteTiming[AppendTo[appendlist, #] & /@ datalist]];
   preallocatedlist = Join[startlist, Table[Null, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, appendtime}, {n, preallocatedtime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Appending", "Preallocating"}, 
 LegendPosition -> {1, 0}]

比较AppendTo和Preallocation的时序图。（运行时间：82秒）

编辑

使用nixeagle建议的修改大大改进了预分配时间，即使用

preallocatedlist=Join[startlist，ConstantArray[0，{Length[datalist]}]

第二次编辑
形式为{{{startsist}，data1}，data2}的链表工作得更好，并且具有很大的优势，即不需要像预分配那样预先知道大小
Needs["PlotLegends`"]
test[n_] := Module[{startlist = Range[1000]},
   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   linkedlisttime = 
    First[AbsoluteTiming[
      Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
        Length[datalist]}];
      linkedlist = Flatten[linkinglist];]];
   preallocatedlist = 
    Join[startlist, ConstantArray[0, {Length[datalist]}]];
   count = -1;
   preallocatedtime = 
    First[AbsoluteTiming[
      Do[preallocatedlist[[count]] = datalist[[count]]; 
       count--, {Length[datalist]}]]];
   {{n, preallocatedtime}, {n, linkedlisttime}}];
results = test[#] & /@ Range[26];
ListLinePlot[Transpose[results], Filling -> Axis, 
 PlotLegend -> {"Preallocating", "Linked-List"}, 
 LegendPosition -> {1, 0}]

链表与预分配的定时比较。（运行时间：6秒）
如果您知道结果将包含多少元素，并且可以计算元素，则不需要整个Append、AppendTo、Linked List等。在Chris的速度测试中，预分配只起作用，因为他提前知道元素的数量。对datelist的访问操作代表当前元素的虚拟计算
如果情况是这样的话，我绝对不会采用这种方法。一个简单的表加上一个连接会更快。让我重用Chris的代码：我将预分配添加到时间度量中，因为在使用Append或链表时，内存分配也会被度量。此外，我真的使用结果列表并检查它们是否相等，因为一个聪明的解释器可能会识别出简单、无用的命令并优化它们
Needs["PlotLegends`"]
test[n_] := Module[{
    startlist = Range[1000],
    datalist, joinResult, linkedResult, linkinglist, linkedlist, 
    preallocatedlist, linkedlisttime, preallocatedtime, count, 
    joinTime, preallocResult},


   datalist = RandomReal[1, n*1000];
   linkinglist = startlist;
   {linkedlisttime, linkedResult} = 
    AbsoluteTiming[
     Do[linkinglist = {linkinglist, datalist[[i]]}, {i, 
       Length[datalist]}];
     linkedlist = Flatten[linkinglist]
     ];

   count = -1;
   preallocatedtime = First@AbsoluteTiming[
      (preallocatedlist = 
        Join[startlist, ConstantArray[0, {Length[datalist]}]];
       Do[preallocatedlist[[count]] = datalist[[count]];
        count--, {Length[datalist]}]
       )
      ];

   {joinTime, joinResult} =
    AbsoluteTiming[
     Join[startlist, 
      Table[datalist[[i]], {i, 1, Length[datalist]}]]];
   PrintTemporary[
    Equal @@@ Tuples[{linkedResult, preallocatedlist, joinResult}, 2]];
   {preallocatedtime, linkedlisttime, joinTime}];

results = test[#] & /@ Range[40];
ListLinePlot[Transpose[results], PlotStyle -> {Black, Gray, Red}, 
 PlotLegend -> {"Prealloc", "Linked", "Joined"}, 
 LegendPosition -> {1, 0}]


在我看来，有趣的情况是，当您事先不知道元素的数量时，您必须特别决定是否必须附加/前置某些内容。在这种情况下，收获和播种也许值得一看。总的来说，我会说，AppendTo是邪恶的，在使用它之前，先看看其他的选择：
n = 10.^5 - 1;
res1 = {};
t1 = First@AbsoluteTiming@Table[With[{y = Sin[x]},
      If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
     ];

{t2, res2} = AbsoluteTiming[With[{r = Release@Table[
        With[{y = Sin[x]},
         If[y > 0, y, Hold@Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
    r]];

{t3, res3} = AbsoluteTiming[Flatten@Table[
     With[{y = Sin[x]},
      If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];

{t4, res4} = AbsoluteTiming[First@Last@Reap@Table[With[{y = Sin[x]},
        If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];

{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}

给出{5.151575,0.250336,0.128624,0.148084}。构造
Flatten@Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]

幸运的是，它可读性强，速度快
评论
在家里尝试最后一个例子时要小心。这里，在我的Ubuntu 64位和Mma 8.0.4上，n=10^5的附件占用10GB内存。n=10^6使用我所有的32GB内存创建一个包含15MB数据的数组。有趣。
+1。为了补充这个好答案，Mathematica列表作为数组实现的事实具有非常深远的影响，影响Mathematica编程的各个方面，从编程风格（基于规则、函数等）到性能调整。尤其是在编写任何性能很重要的代码时，我们都应该非常清楚这一点。我认为，不幸的是，Mathematica中缺少了最重要的结构，即struct（或record）。这使得在大型程序中，当在函数之间传递许多信息时，很难管理数据，因为不能在记录中安排相关数据，而只能传递单个参数。对于小型程序来说，这并不重要。我不明白一个人如何在没有记录数据结构的情况下设计非常大的程序。目前在Mathematica中模拟结构的所有解决方案都不是很好。@Nasser，这是你的结构：name[val11，val2，…]write selector getVal1[a_name]：=a[[1]]等等，你可以这样做。Mathematica在很大程度上是用Mathematica编写的（大约1MLOC），我认为这很重要。@rubenko，有人能把这样的“结构”传递给编译过的函数吗？@Nasser——我使用像记录这样的定义；我认为它们是哈希表或JavaScript对象。要在变量r的“内部”对键值对进行建模，请执行如下操作：r[“a”]=1；r[2]=“b”；用？？r报告所有值。检索像r[“a”]或r[2]这样的值。检查具有r[foo]==r[foo]的键foo的值是否不存在。我已经用这个技巧编写了一个完整的记录处理库（或者k-VDB库，如果您愿意的话）。可以轻松地使用关联列表（k-v对的显式列表）进行往返。我稍后会发布一个链接。至少在Mathematica 8上，我通过将Table[Null，{Length[datalist]}]
更改为ConstantArray[0，{Length[datalist]}]获得了巨大的性能改进。对于大输入，结果行为至少快了一个数量级，并反映了预期的算法复杂性
n = 10.^5 - 1;
res1 = {};
t1 = First@AbsoluteTiming@Table[With[{y = Sin[x]},
      If[y > 0, AppendTo[res1, y]]], {x, 0, 2 Pi, 2 Pi/n}
     ];

{t2, res2} = AbsoluteTiming[With[{r = Release@Table[
        With[{y = Sin[x]},
         If[y > 0, y, Hold@Sequence[]]], {x, 0, 2 Pi, 2 Pi/n}]},
    r]];

{t3, res3} = AbsoluteTiming[Flatten@Table[
     With[{y = Sin[x]},
      If[y > 0, y, {}]], {x, 0, 2 Pi, 2 Pi/n}]];

{t4, res4} = AbsoluteTiming[First@Last@Reap@Table[With[{y = Sin[x]},
        If[y > 0, Sow[y]]], {x, 0, 2 Pi, 2 Pi/n}]];

{res1 == res2, res2 == res3, res3 == res4}
{t1, t2, t3, t4}

Flatten@Table[ With[{y = Sin[x]}, If[y > 0, y, {}]], ...]