Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Multithreading 为什么我的程序用一个内核比用两个内核更快?_Multithreading_Haskell_Parallel Processing - Fatal编程技术网

Multithreading 为什么我的程序用一个内核比用两个内核更快?

Multithreading 为什么我的程序用一个内核比用两个内核更快?,multithreading,haskell,parallel-processing,Multithreading,Haskell,Parallel Processing,我目前正试图了解如何在Haskell中并行编程。我正在关注Simon Peyton Jones和Satnam Singh撰写的论文《Haskell并行和并发编程教程》。源代码如下所示: module Main where import Control.Parallel import System.Time main :: IO () main = do putStrLn "Starting computation....." t0 <- getClockTime

我目前正试图了解如何在Haskell中并行编程。我正在关注Simon Peyton Jones和Satnam Singh撰写的论文《Haskell并行和并发编程教程》。源代码如下所示:

module Main where
import Control.Parallel
import System.Time

main :: IO ()
main = do
      putStrLn "Starting computation....."
      t0 <- getClockTime
      pseq r1 (return())
      t1 <- getClockTime
      putStrLn ("sum: " ++ show r1)
      putStrLn ("time: " ++ show (secDiff t0 t1) ++ " seconds")
      putStrLn "Finish."

r1 :: Int
r1 = parSumFibEuler 38 5300

-- This is the Fibonacci number generator
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

-- Gets the euler sum
mkList :: Int -> [Int]
mkList n = [1..n-1]

relprime :: Int -> Int -> Bool
relprime x y = gcd x y == 1

euler :: Int -> Int
euler n = length $ filter (relprime n) (mkList n)

sumEuler :: Int -> Int
sumEuler = sum.(map euler).mkList

-- Gets the sum of Euler and Fibonacci (NORMAL)
sumFibEuler :: Int -> Int -> Int
sumFibEuler a b = fib a + sumEuler b

-- Gets the sum of Euler and Fibonacci (PARALLEL)
parSumFibEuler :: Int -> Int -> Int
parSumFibEuler a b =
  f `par` (e `pseq`(f+e))
    where
    f = fib a
    e = sumEuler b

-- Measure time
secDiff :: ClockTime -> ClockTime -> Float
secDiff (TOD secs1 psecs1) (TOD secs2 psecs2)
  = fromInteger (psecs2 -psecs1) / 1e12 + fromInteger (secs2- secs1)
a) 使用1个内核运行它:

./Main +RTS -N1
b) 使用2个内核运行它:

./Main +RTS -N2
然而,一个核心运行了53.556秒。然而,两个核心运行了73.401秒。我不明白为什么2个内核的运行速度比1个内核慢。可能消息传递开销对于这个小程序来说太大了?与矿山相比,该文件的结果完全不同。以下是输出详细信息

对于1芯:

Starting computation.....
sum: 47625790
time: 53.556335 seconds
Finish.
  17,961,210,216 bytes allocated in the heap
      12,595,880 bytes copied during GC
         176,536 bytes maximum residency (3 sample(s))
          23,904 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     34389 colls,     0 par    2.54s    2.57s     0.0001s    0.0123s
  Gen  1         3 colls,     0 par    0.00s    0.00s     0.0007s    0.0010s

  Parallel GC work balance: -nan (0 / 0, ideal 1)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
  Task  1 (worker) :    0.00s    ( 53.56s)       0.00s    (  0.00s)
  Task  2 (bound)  :   50.49s    ( 50.99s)       2.52s    (  2.57s)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   50.47s  ( 50.99s elapsed)
  GC      time    2.54s  (  2.57s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   53.02s  ( 53.56s elapsed)

  Alloc rate    355,810,305 bytes per MUT second

  Productivity  95.2% of total user, 94.2% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
对于2芯:

Starting computation.....
sum: 47625790
time: 73.401146 seconds
Finish.
  17,961,210,256 bytes allocated in the heap
      12,558,088 bytes copied during GC
         176,536 bytes maximum residency (3 sample(s))
         195,936 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     34389 colls, 34388 par    7.42s    4.73s     0.0001s    0.0205s
  Gen  1         3 colls,     3 par    0.01s    0.00s     0.0011s    0.0017s

  Parallel GC work balance: 1.00 (1432193 / 1429197, ideal 2)

                        MUT time (elapsed)       GC time  (elapsed)
  Task  0 (worker) :    1.19s    ( 40.26s)      16.95s    ( 33.15s)
  Task  1 (worker) :    0.00s    ( 73.40s)       0.00s    (  0.00s)
  Task  2 (bound)  :   54.50s    ( 68.67s)       3.66s    (  4.73s)
  Task  3 (worker) :    0.00s    ( 73.41s)       0.00s    (  0.00s)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   68.87s  ( 68.67s elapsed)
  GC      time    7.43s  (  4.73s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   76.31s  ( 73.41s elapsed)

  Alloc rate    260,751,318 bytes per MUT second

  Productivity  90.3% of total user, 93.8% of total elapsed

gc_alloc_block_sync: 12254
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
我相信你的意思是

r1 = parSumFibEuler 38 5300

在我的配置中(使用
parSumFibEuler 45 8000
且仅运行一次):

  • 当N1=126.83s时
  • 当N2=115.46s时
我怀疑
fib
函数比
sumEuler
消耗的CPU要多得多。这就解释了-N2的低改善。在你的情况下不会有偷工减料

有了记忆,你的斐波那契函数会更好,但我不认为这是你想要尝试的

编辑:正如评论中提到的,我认为使用-N2会有很多中断,因为有两个内核可用
使用
sum$parMap rdeepseq(fib)[1..40]

  • 对于-N1,需要约26秒
  • 对于-N2,需要约16秒
  • 使用-N3需要约13秒
  • 使用-N4大约需要30秒(好吧,Haskell程序在这里并不孤单)
发件人:

使用机器中的所有处理器时要小心:如果 您的处理器正被其他程序使用,这实际上可能会造成损害 性能,而不是改进它


你们有两个核心的CPU吗?是的,这是一台笔记本电脑,CPU是APU E350。我的机器是华硕1215B。这次我又用更大的数字试了一次。r1=45.8000。结果是1 core 349.2701秒2 core 530.3064秒。是的,我认为这两个函数的比率在加速过程中可能起到很大作用。因为您的机器有两个内核,-N指定了应用程序能够使用的操作系统线程数(映射到内核上),你也可能会受到其他应用程序的干扰。我认为这是有道理的,我会在我的另一台机器上尝试,那就是i5。
r1 = sumFibEuler 38 5300
r1 = parSumFibEuler 38 5300