Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/macos/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Haskell 解析包含大量int的行的最快方法是什么?_Haskell - Fatal编程技术网

Haskell 解析包含大量int的行的最快方法是什么?

Haskell 解析包含大量int的行的最快方法是什么?,haskell,Haskell,我学习Haskell已经两年了,但我仍然很困惑,从一条输入线上读取大量数字的最佳(最快)方法是什么。 为了学习,我注册了hackerth.com,试图解决Haskell的每一个挑战。但现在我遇到了一个挑战,因为我遇到了超时问题。我的程序太慢了,不能被网站接受 使用分析器,我发现解析一个包含大量整数的行需要80%以上的时间。当行中的值数量增加时,百分比会变得更高 现在是这样,我从输入行读取数字: import qualified Data.ByteString.Char8 as C8 main

我学习Haskell已经两年了,但我仍然很困惑,从一条输入线上读取大量数字的最佳(最快)方法是什么。 为了学习,我注册了hackerth.com,试图解决Haskell的每一个挑战。但现在我遇到了一个挑战,因为我遇到了超时问题。我的程序太慢了,不能被网站接受

使用分析器,我发现解析一个包含大量整数的行需要80%以上的时间。当行中的值数量增加时,百分比会变得更高

现在是这样,我从输入行读取数字:

import qualified Data.ByteString.Char8 as C8

main = do
   scores <- fmap (map (fst . fromJust . C8.readInt) . C8.words) C8.getLine :: IO [Int]
将限定数据.ByteString.Char8导入为C8
main=do

得分60分)。

通常很难宣布某个方法是“最快的”,因为几乎总是有办法挤出更多的性能。但是,使用
Data.ByteString.Char8
的方法和您建议的常规方法应该是读取数字的最快方法之一。如果遇到性能差的情况,问题可能出在其他地方

为了给出一些具体的结果,我生成了一个191Meg文件,其中包含2000万个9位数的数字,空格分隔在一行上。然后,我尝试了几种读取一行数字并打印其总和的通用方法(记录在案的数字总和为10999281565534666)。使用
字符串的明显方法是:

reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
运行时间为2.4秒(但请注意,需要对其进行修改以处理负数!);和使用Char8的相同方法:

import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
  where parse s = let Just (n, _) = C.readInt s in n

运行在2.5SEC,与<代码>文本< /代码>实现一致。

您可以从
Char8
实现中挤出更多的性能。使用手动解析器:

readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
  where parse = unfoldr go
        go s = do (n, s1) <- C.readInt s
                  let s2 = C.dropWhile C.isSpace s1
                  return (n, s2)
使用sum.hs查找它们的总和:

import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment

-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs

readString :: IO [Int]
readString = map read . words <$> getLine

readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
  where parse s = let Right (n, _) = T.decimal s in n

readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
  where parse s = let Just (n, _) = C.readInt s in n

readHand :: IO [Int]
readHand = parse <$> C.getLine
  where parse = unfoldr go
        go s = do (n, s1) <- C.readInt s
                  let s2 = C.dropWhile C.isSpace s1
                  return (n, s2)

main = do [method] <- getArgs
          let reader = case method of
                "string" -> readString
                "text"   -> readText
                "char8"  -> readChar8
                "hand"   -> readHand
          print =<< sum' <$> reader
导入数据。列表
导入符合条件的数据。文本为T
将限定的Data.Text.IO作为T导入
导入限定数据.Text.Read作为T
将符合条件的数据.Char8作为C导入
将限定的Data.ByteString.Char8作为C导入
导入系统。环境
--变通https://ghc.haskell.org/trac/ghc/ticket/10992
sum'xs=sum xs
readString::IO[Int]
readString=映射读取。单词getLine
readText::IO[Int]
readText=映射解析。T.words T.getLine
其中,parse s=let Right(n,u)=T.decimal s in n
readChar8::IO[Int]
readChar8=映射解析。C.words C.getLine
其中parse s=let Just(n,u)=C.readInt s in n
readHand::IO[Int]
readHand=parse C.getLine
其中parse=unfover go
go s=do(n,s1)读取文本
“char8”->readChar8
“手”->readHand

print=您是否使用优化编译(?)?另外,如果我没记错的话,我认为bytestring比bytestring.char8快。虽然我不知道挑战是什么,但通常情况下,在转换输入的同时使用输入比阅读然后应用更好​ 一些函数你看过阿托帕塞克吗?你是如何在60秒时到达的?在我的机器上,您的代码执行时间大约为0.3秒(使用
ghc-O0
编译)。FTR,创建一个以十进制、字符串形式显示的200k整数的文件是相当疯狂的。这是你绝对想以二进制形式存储的数量。@epsilonhalbe:不知道,“黑客帝国”是如何编译我的代码的。你有优化的例子吗?
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
  where parse = unfoldr go
        go s = do (n, s1) <- C.readInt s
                  let s2 = C.dropWhile C.isSpace s1
                  return (n, s2)
-- |Generate 20M 9-digit numbers:
--    ./Numbers 20000000 100000000 999999999 > data1.txt

import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random

main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
          nums <- replicateM n (randomRIO (a,b))
          let _ = nums :: [Int]
          C.putStrLn (C.unwords (map (C.pack . show) nums))
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment

-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs

readString :: IO [Int]
readString = map read . words <$> getLine

readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
  where parse s = let Right (n, _) = T.decimal s in n

readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
  where parse s = let Just (n, _) = C.readInt s in n

readHand :: IO [Int]
readHand = parse <$> C.getLine
  where parse = unfoldr go
        go s = do (n, s1) <- C.readInt s
                  let s2 = C.dropWhile C.isSpace s1
                  return (n, s2)

main = do [method] <- getArgs
          let reader = case method of
                "string" -> readString
                "text"   -> readText
                "char8"  -> readChar8
                "hand"   -> readHand
          print =<< sum' <$> reader
./Sum string <data1.txt   # 54.3 secs
./Sum text   <data1.txt   # 2.29 secs
./Sum char8  <data1.txt   # 1.34 secs
./Sum hand   <data1.txt   # 0.91 secs