Haskell 剖析平凡循环
我试图了解如何使用性能评测。以下是USACO 2013的“视线”问题解决方案Haskell 剖析平凡循环,haskell,profiling,Haskell,Profiling,我试图了解如何使用性能评测。以下是USACO 2013的“视线”问题解决方案 import Data.Array.Unboxed import Data.List import Data.Int angle !a | a > 2 * pi = a - 2 * pi angle !a | a < 0 = a + 2 * pi angle !a = a tans :: Int64 -> [[Int64]] -> UArray (Int,
import Data.Array.Unboxed
import Data.List
import Data.Int
angle !a | a > 2 * pi = a - 2 * pi
angle !a | a < 0 = a + 2 * pi
angle !a = a
tans :: Int64 -> [[Int64]] -> UArray (Int,Int) Double
tans r cs = listArray ((0,0), (length cs - 1, 1)) $ concatMap f cs where
f :: [Int64] -> [Double]
f [x,y] = [angle a2, angle a1] where
phi | y == 0 = if x < 0 then pi else 0.0
| otherwise = (fromIntegral $ signum y) * (acos $ (fromIntegral x) / d)
d = sqrt $ fromIntegral $ x*x + y*y
z = sqrt $ fromIntegral $ x*x + y*y - r*r
a1 = phi + (acos $ (fromIntegral r)/d)
a2 = phi - (acos $ (fromIntegral r)/d)
overlap !a1 !a2 !a1' !a2'
| a1 < a2 && a1' < a2' = a1 <= a2' && a1' <= a2
| a1 > a2 && a1' > a2' = overlap (a1 - 2*pi) a2 (a1' - 2*pi) a2'
| a1 > a2 && a1' <= pi = overlap (a1 - 2*pi) a2 a1' a2'
| a1 > a2 = overlap a1 (a2 + 2*pi) a1' a2'
| a1 <= pi = overlap a1 a2 (a1' - 2*pi) a2'
| otherwise = overlap a1 a2 a1' (a2' + 2 * pi)
solve cows = length $ [ 1
| i <- [0..n]
, j <- [i+1..n]
, let a1 = cows ! (i,0)
, let a2 = cows ! (i,1)
, let a1' = cows ! (j,0)
, let a2' = cows ! (j,1)
, overlap a1 a2 a1' a2' ] where
((0,0),(n,1)) = bounds cows
main = do
ls <- getContents
let ([n, r]: cows ) = map (map read . words) $ lines ls
print $ solve $ tans r cows
import Data.Array.unbox
导入数据。列表
导入数据.Int
角度!a | a>2*pi=a-2*pi
角度!a | a<0=a+2*pi
角度!a=a
tans::Int64->[[Int64]]->UArray(Int,Int)双精度
tans r cs=listArray((0,0),(长度cs-1,1))$concatMap f cs其中
f::[Int64]->[Double]
f[x,y]=[角度a2,角度a1]其中
phi | y==0=如果x<0,则pi=0.0
|否则=(从积分$signum y)*(acos$(从积分x)/d)
d=sqrt$from积分$x*x+y*y
z=sqrt$from积分$x*x+y*y-r*r
a1=φ+(acos$(来自积分r)/d)
a2=φ-(acos$(来自积分r)/d)
重叠a1!a2!a1'!a2'
|a1 |a1ghc无法通过构建列表来折叠长度计算-即,它分配列表单元格
如果将solve
重写为显式循环,则分配将消失:
solve cows = n `seq` go 0 0 1 n
where
(_,(n,_)) = bounds cows
go count i j n | i > n = count
| j > n = go count (i+1) (i+2) n
| overlap (cows ! (i,0)) (cows ! (i,1)) (cows ! (j,0)) (cows ! (j,1))
= go (count + 1) i (j + 1) n
| otherwise = go count i (j + 1) n
至于为什么分配给a1'和a2',我不知道
Cpu使用主要由go
功能控制,这可能意味着阵列访问<代码>重叠
仅占总运行时间的15%左右
编辑:以下是(可读性较差)版本,其中两个数组访问移出了内部循环:
solve !cows = n `seq` go 0 0
where
(_,(n,_)) = bounds cows
go !count !i | i >= n = count
| otherwise = go2 count i (i+1) (cows ! (i,0)) (cows ! (i,1))
go2 !count !i !j !a1 !a2 | j > n = go count (i+1)
| overlap a1 a2 (cows ! (j,0)) (cows ! (j,1))
= go2 (count+1) i (j+1) a1 a2
| otherwise = go2 count i (j+1) a1 a2
我对此有意见。1.阵列访问成本比一系列测试高4倍?2.如果仅此而已,为什么它会运行3秒?(对于6.in,超过15秒——这只是几百万次数组访问和比较)3。分配仍然归于a1、a2、a1'和a2'(我保留了它们,而不是直接提供参数)4。用“go”重写并没有加快程序的速度。你没有要求加快程序的速度,只是要求分配和时间到哪里。虽然5.in中的数据应该适合二级缓存,但与某些分支相比(假设分支预测率较高),访问这些数据仍然需要大量时间。请注意,我的版本实际上比原始版本慢,因为索引为i的数组访问没有移出内部j循环。添加更快的版本。另外,使用-fllvmA C版本编译FP代码的速度仅为原来的3倍——要更快地获得Haskell版本,您需要进行低级代码调优。顺便说一句,分析C版本显示80%的时间重叠;但是我希望“问题出在哪里”的证明通过改变执行速度来显示:)好的,我将尝试使用-fllvm
solve !cows = n `seq` go 0 0
where
(_,(n,_)) = bounds cows
go !count !i | i >= n = count
| otherwise = go2 count i (i+1) (cows ! (i,0)) (cows ! (i,1))
go2 !count !i !j !a1 !a2 | j > n = go count (i+1)
| overlap a1 a2 (cows ! (j,0)) (cows ! (j,1))
= go2 (count+1) i (j+1) a1 a2
| otherwise = go2 count i (j+1) a1 a2