为什么Haskell代码在使用-O时运行较慢？_Haskell_Optimization_Ghc_Compiler Bug

为什么Haskell代码在使用-O时运行较慢？

haskell optimization

为什么Haskell代码在使用-O时运行较慢？,haskell,optimization,ghc,compiler-bug,Haskell,Optimization,Ghc,Compiler Bug,这段Haskell代码在使用-O时运行速度要慢得多，但是-O应该是。谁能告诉我发生了什么事？如果重要的话，这是一种尝试，它使用二进制搜索和持久段树： import Control.Monad import Data.Array data Node = Leaf Int -- value | Branch Int Node Node -- sum, left child, right child type NodeArray = Array Int N

这段Haskell代码在使用

-O

时运行速度要慢得多，但是

-O

应该是。谁能告诉我发生了什么事？如果重要的话，这是一种尝试，它使用二进制搜索和持久段树：

import Control.Monad
import Data.Array

data Node =
      Leaf   Int           -- value
    | Branch Int Node Node -- sum, left child, right child
type NodeArray = Array Int Node

-- create an empty node with range [l, r)
create :: Int -> Int -> Node
create l r
    | l + 1 == r = Leaf 0
    | otherwise  = Branch 0 (create l m) (create m r)
    where m = (l + r) `div` 2

-- Get the sum in range [0, r). The range of the node is [nl, nr)
sumof :: Node -> Int -> Int -> Int -> Int
sumof (Leaf val) r nl nr
    | nr <= r   = val
    | otherwise = 0
sumof (Branch sum lc rc) r nl nr
    | nr <= r   = sum
    | r  > nl   = (sumof lc r nl m) + (sumof rc r m nr)
    | otherwise = 0
    where m = (nl + nr) `div` 2

-- Increase the value at x by 1. The range of the node is [nl, nr)
increase :: Node -> Int -> Int -> Int -> Node
increase (Leaf val) x nl nr = Leaf (val + 1)
increase (Branch sum lc rc) x nl nr
    | x < m     = Branch (sum + 1) (increase lc x nl m) rc
    | otherwise = Branch (sum + 1) lc (increase rc x m nr)
    where m = (nl + nr) `div` 2

-- signature said it all
tonodes :: Int -> [Int] -> [Node]
tonodes n = reverse . tonodes' . reverse
    where
        tonodes' :: [Int] -> [Node]
        tonodes' (h:t) = increase h' h 0 n : s' where s'@(h':_) = tonodes' t
        tonodes' _ = [create 0 n]

-- find the minimum m in [l, r] such that (predicate m) is True
binarysearch :: (Int -> Bool) -> Int -> Int -> Int
binarysearch predicate l r
    | l == r      = r
    | predicate m = binarysearch predicate l m
    | otherwise   = binarysearch predicate (m+1) r
    where m = (l + r) `div` 2

-- main, literally
main :: IO ()
main = do
    [n, m] <- fmap (map read . words) getLine
    nodes <- fmap (listArray (0, n) . tonodes n . map (subtract 1) . map read . words) getLine
    replicateM_ m $ query n nodes
    where
        query :: Int -> NodeArray -> IO ()
        query n nodes = do
            [p, k] <- fmap (map read . words) getLine
            print $ binarysearch (ok nodes n p k) 0 n
            where
                ok :: NodeArray -> Int -> Int -> Int -> Int -> Bool
                ok nodes n p k s = (sumof (nodes ! min (p + s + 1) n) s 0 n) - (sumof (nodes ! max (p - s) 0) s 0 n) >= k

这是堆配置文件摘要：

$ghc-fforce recomp-rtsopts./1827.hs
[1/1]主要汇编（1827.hs，1827.o）
正在链接1827.exe。。。
$./gen.exe 1000 |./1827.exe+RTS-s>/dev/null
堆中分配的70207096字节
在GC期间复制2112416字节
613368字节最大驻留时间（3个示例）
28816字节最大斜率
使用中的总内存为3 MB（由于碎片而丢失0 MB）
Tot时间（已用）平均暂停最大暂停
Gen 0 132 Cels，0面值0.00 s 0.00 s 0.00 00 s 0.000 04s
Gen 1 3 Cels，0个PAR 0.00 s 0.00 s 0.000 06S 0.00 10S
初始时间0.00s（经过0.00s）
MUT时间0.03s（经过0.03s）
GC时间0.00s（经过0.01s）
退出时间0.00s（经过0.00s）
总时间0.03秒（经过0.04秒）
%GC时间0.0%（已用14.7%）
每分钟分配速率2250213011字节
生产力占总用户的100.0%，占总运行时间的83.1%
$ghc-fforce建议-O-rtsopts./1827.hs
[1/1]主要汇编（1827.hs，1827.o）
正在链接1827.exe。。。
$./gen.exe 1000 |./1827.exe+RTS-s>/dev/null
在堆中分配的6009233608字节
在GC期间复制的622682200字节
443240字节最大驻留时间（505个样本）
48256字节最大斜率
使用中的总内存为3 MB（由于碎片而丢失0 MB）
Tot时间（已用）平均暂停最大暂停
Gen 0 10945张，0个PAR 0.72S 0.63S 0.000 01S 0.000 04S
Gen 1 505 Cels，0 PAR 0.16S 0.13S 0.000 03S 0.000 05S
初始时间0.00s（经过0.00s）
MUT时间2.00s（经过2.13s）
GC时间0.87s（经过0.76s）
退出时间0.00s（经过0.00s）
总时间2.89秒（经过2.90秒）
%GC时间30.3%（已用26.4%）
分配速率为每秒3009412603字节
生产力占总用户的69.7%，占总消耗量的69.4%

使用

-O

让我放大您的主函数，并稍微重写它：

main :: IO ()
main = do
    [n, m] <- fmap (map read . words) getLine
    line <- getLine
    let nodes = listArray (0, n) . tonodes n . map (subtract 1) . map read . words $ line
    replicateM_ m $ query n nodes

你可以马上看到这里的问题

什么是国家黑客，为什么它破坏我的程序性能原因是statehack，它（粗略地）说：“当某个东西的类型是

ioa

，假设它只被调用一次。”。问题并不复杂：

-fno state hack

关闭“state hack”，即任何带有state#token作为参数的lambda都被认为是单个条目，因此可以在其中内联内容。这可以提高IO和ST monad代码的性能，但会降低共享的风险

大致来说，其思想如下：如果您使用

IO

类型和where子句定义函数，例如

foo x = do
    putStrLn y
    putStrLn y
  where y = ...x...

ioa类型的东西可以看作是RealWord->（a，RealWorld）
类型的东西。在该视图中，上述内容（大致）变为
对foo
的调用（通常）如下所示。但是foo
的定义只接受一个参数，而另一个参数只在以后由本地lambda表达式使用！这将是一个非常缓慢的调用foo
。如果代码如下所示，则速度会快得多：
foo x world1 = 
   let y = ...x... in 
   let (world2, ()) = putStrLn y world1
   let (world3, ()) = putStrLn y world2
   in  (world3, ())

这被称为eta扩展，并在不同的基础上进行（例如，通过、通过和——在本例中是通过——类型导向启发式）
不幸的是，如果对foo
的调用的形式实际上是let fooArgument=foo argument
，即带有一个参数，但没有传递world
，则会降低性能。在原始代码中，如果随后多次使用fooArgument
，y
仍将只计算一次并共享。在修改后的代码中，y
每次都将重新计算–这正是节点发生的情况
事情能解决吗？
可能吧。请参阅，以了解这样做的尝试。修复它的问题是，在很多情况下，如果转换发生在ok上，则会降低性能，即使编译器不可能确定这一点。在某些情况下，可能在技术上不可行，即共享丢失，但这仍然是有益的，因为更快的呼叫带来的加速超过了重新计算的额外成本。因此，现在还不清楚该从何处着手。
感谢您将GHC版本包括在内@结果现在已内联到我的问题中。还有一个选项可以尝试：-fno state hack
。我不知道太多的细节，但基本上这是一种猜测程序创建的某些函数（即隐藏在IO
或ST
类型中的函数）只被调用一次的启发式方法。这通常是一个好的猜测，但当它是一个坏的猜测，GHC可以产生非常糟糕的代码。很长一段时间以来，开发人员一直在试图找到一种既有好处又有坏处的方法。我想约阿希姆·布雷特纳最近在做这件事，看起来很像。请注意，这两个程序都使用了replicateM\uu
，GHC会错误地将计算从replicateM\u
外部移动到内部，从而重复它。非常有趣！但我还不太明白为什么：“另一个只会在稍后被本地lambda表达式使用！这将是一个非常缓慢的调用foo
”？对于特定的本地情况，有什么解决方法吗<代码>-f-no-state-hack
编译时似乎相当沉重foo x = 
   let y = ...x... in 
   \world1 ->
     let (world2, ()) = putStrLn y world1
     let (world3, ()) = putStrLn y world2
     in  (world3, ())


foo x world1 = 
   let y = ...x... in 
   let (world2, ()) = putStrLn y world1
   let (world3, ()) = putStrLn y world2
   in  (world3, ())