Arrays 如何提高Julia中嵌套数组的速度?
下面的函数Arrays 如何提高Julia中嵌套数组的速度?,arrays,nested,julia,Arrays,Nested,Julia,下面的函数nested_array生成一个“depth”n的嵌套数组。但是,即使在运行n(2、3等)的小值时,运行和显示输出也需要相当长的时间 julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)] nested_arrays (generic function with 1 method) julia> nested_arrays(1) 1-element Array{Int64,1}: 1 julia&
nested_array
生成一个“depth”n
的嵌套数组。但是,即使在运行n
(2
、3
等)的小值时,运行和显示输出也需要相当长的时间
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> nested_arrays(1)
1-element Array{Int64,1}:
1
julia> nested_arrays(2)
1-element Array{Array{Int64,1},1}:
[1]
julia> nested_arrays(3)
1-element Array{Array{Array{Int64,1},1},1}:
Array{Int64,1}[[1]]
julia> nested_arrays(10)
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]
有趣的是,当使用@time
宏或时代码>在行的末尾,计算结果花费的时间相对较少。相反,在REPL中实际显示结果会占用大部分时间
例如,Python中没有显示这种奇怪的行为
In [1]: def nested_lists(n):
...: if n == 1:
...: return [1]
...: return [nested_lists(n - 1)]
...:
In [2]: nested_lists(10)
Out[2]: [[[[[[[[[[1]]]]]]]]]]
In [3]: %time nested_lists(100)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 37.7 µs
Out[3]: [[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
为什么这个函数在Julia中如此缓慢?Julia是否正在为数组{T,1}
中的不同类型T
重新编译display
函数?如果是,为什么会这样
这段代码的速度可以提高吗,还是不可以在Julia中实现?从实际意义上讲,我主要关心的是,例如,加载一个复杂的嵌套JSON文件,而仅仅使用n
维数组是不可能的。是的,这完全是由于编译时间。您可以通过@time
-ing查看显示屏
。第二次显示时速度很快:
julia> nested_arrays(n) = n == 1 ? [1] : [nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> @time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
11.682721 seconds (8.83 M allocations: 371.698 MB, 1.82% gc time)
julia> @time display(nested_arrays(15));
1-element Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1},1},1}:
Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1},1}[Array{Array{Array{Array{Array{Array{Int64,1},1},1},1},1},1}[Array{Array{Array{Array{Array{Int64,1},1},1},1},1}[Array{Array{Array{Array{Int64,1},1},1},1}[Array{Array{Array{Int64,1},1},1}[Array{Array{Int64,1},1}[Array{Int64,1}[[1]]]]]]]]]]]]]]
0.001688 seconds (2.38 k allocations: 102.766 KB)
那为什么这么慢?这里的显示递归地遍历所有数组,并打印嵌套在彼此内部的数组。这是用14种不同的类型递归调用show
——一种是14个嵌套数组,然后是13个嵌套数组的元素,然后是12个嵌套数组的元素……以此类推!这些show
方法中的每一个都是独立编译的。编译特定元素类型的专用方法是Julia如何生成高效代码的关键部分。这意味着它能够专门化对每个元素执行的每个操作,而无需任何运行时类型检查或分派。不幸的是,在这种情况下,它会成为阻碍
您可以使用Any[]
数组解决此问题;在JSON文件的上下文中,这非常有意义,因为您不知道它是否包含字符串、数组或数字等。这要快得多,因为它只需要为Any[]
数组编译show方法一次,然后递归使用它
# new session
julia> nested_arrays(n) = n == 1 ? Any[1] : Any[nested_arrays(n - 1)]
nested_arrays (generic function with 1 method)
julia> @time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
1.571632 seconds (767.12 k allocations: 32.472 MB, 1.04% gc time)
julia> @time display(nested_arrays(15));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]
0.000606 seconds (839 allocations: 30.859 KB)
julia> @time display(nested_arrays(100));
1-element Array{Any,1}:
Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[Any[1]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]
0.002523 seconds (17.76 k allocations: 579.297 KB)
我想补充一点,这是一个Julia倾向于编译函数的专用版本的例子——这通常使Julia很快——是错误的:最好只为数组编译一个单一的、缓慢的、通用的show函数版本。Python总是这样做,在这种情况下,它恰好是正确的做法。在未来,专门化启发法可以很容易地变得更智能,而无需改变任何语言语义。