Haskell 解析包含大量int的行的最快方法是什么?
我学习Haskell已经两年了,但我仍然很困惑,从一条输入线上读取大量数字的最佳(最快)方法是什么。 为了学习,我注册了hackerth.com,试图解决Haskell的每一个挑战。但现在我遇到了一个挑战,因为我遇到了超时问题。我的程序太慢了,不能被网站接受 使用分析器,我发现解析一个包含大量整数的行需要80%以上的时间。当行中的值数量增加时,百分比会变得更高 现在是这样,我从输入行读取数字:Haskell 解析包含大量int的行的最快方法是什么?,haskell,Haskell,我学习Haskell已经两年了,但我仍然很困惑,从一条输入线上读取大量数字的最佳(最快)方法是什么。 为了学习,我注册了hackerth.com,试图解决Haskell的每一个挑战。但现在我遇到了一个挑战,因为我遇到了超时问题。我的程序太慢了,不能被网站接受 使用分析器,我发现解析一个包含大量整数的行需要80%以上的时间。当行中的值数量增加时,百分比会变得更高 现在是这样,我从输入行读取数字: import qualified Data.ByteString.Char8 as C8 main
import qualified Data.ByteString.Char8 as C8
main = do
scores <- fmap (map (fst . fromJust . C8.readInt) . C8.words) C8.getLine :: IO [Int]
将限定数据.ByteString.Char8导入为C8
main=do
得分60分)。通常很难宣布某个方法是“最快的”,因为几乎总是有办法挤出更多的性能。但是,使用Data.ByteString.Char8
的方法和您建议的常规方法应该是读取数字的最快方法之一。如果遇到性能差的情况,问题可能出在其他地方
为了给出一些具体的结果,我生成了一个191Meg文件,其中包含2000万个9位数的数字,空格分隔在一行上。然后,我尝试了几种读取一行数字并打印其总和的通用方法(记录在案的数字总和为10999281565534666)。使用字符串的明显方法是:
reader :: IO [Int]
reader = map read . words <$> getLine
sum' xs = sum xs -- work around GHC ticket 10992
main = print =<< sum' <$> reader
运行时间为2.4秒(但请注意,需要对其进行修改以处理负数!);和使用Char8的相同方法:
import qualified Data.ByteString.Char8 as C
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
运行在2.5SEC,与<代码>文本< /代码>实现一致。
您可以从Char8
实现中挤出更多的性能。使用手动解析器:
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
使用sum.hs查找它们的总和:
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
导入数据。列表
导入符合条件的数据。文本为T
将限定的Data.Text.IO作为T导入
导入限定数据.Text.Read作为T
将符合条件的数据.Char8作为C导入
将限定的Data.ByteString.Char8作为C导入
导入系统。环境
--变通https://ghc.haskell.org/trac/ghc/ticket/10992
sum'xs=sum xs
readString::IO[Int]
readString=映射读取。单词getLine
readText::IO[Int]
readText=映射解析。T.words T.getLine
其中,parse s=let Right(n,u)=T.decimal s in n
readChar8::IO[Int]
readChar8=映射解析。C.words C.getLine
其中parse s=let Just(n,u)=C.readInt s in n
readHand::IO[Int]
readHand=parse C.getLine
其中parse=unfover go
go s=do(n,s1)读取文本
“char8”->readChar8
“手”->readHand
print=您是否使用优化编译(?)?另外,如果我没记错的话,我认为bytestring比bytestring.char8快。虽然我不知道挑战是什么,但通常情况下,在转换输入的同时使用输入比阅读然后应用更好 一些函数你看过阿托帕塞克吗?你是如何在60秒时到达的?在我的机器上,您的代码执行时间大约为0.3秒(使用ghc-O0
编译)。FTR,创建一个以十进制、字符串形式显示的200k整数的文件是相当疯狂的。这是你绝对想以二进制形式存储的数量。@epsilonhalbe:不知道,“黑客帝国”是如何编译我的代码的。你有优化的例子吗?
readChar8' :: IO [Int]
readChar8' = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
-- |Generate 20M 9-digit numbers:
-- ./Numbers 20000000 100000000 999999999 > data1.txt
import qualified Data.ByteString.Char8 as C
import Control.Monad
import System.Environment
import System.Random
main :: IO ()
main = do [n, a, b] <- map read <$> getArgs
nums <- replicateM n (randomRIO (a,b))
let _ = nums :: [Int]
C.putStrLn (C.unwords (map (C.pack . show) nums))
import Data.List
import qualified Data.Text as T
import qualified Data.Text.IO as T
import qualified Data.Text.Read as T
import qualified Data.Char8 as C
import qualified Data.ByteString.Char8 as C
import System.Environment
-- work around https://ghc.haskell.org/trac/ghc/ticket/10992
sum' xs = sum xs
readString :: IO [Int]
readString = map read . words <$> getLine
readText :: IO [Int]
readText = map parse . T.words <$> T.getLine
where parse s = let Right (n, _) = T.decimal s in n
readChar8 :: IO [Int]
readChar8 = map parse . C.words <$> C.getLine
where parse s = let Just (n, _) = C.readInt s in n
readHand :: IO [Int]
readHand = parse <$> C.getLine
where parse = unfoldr go
go s = do (n, s1) <- C.readInt s
let s2 = C.dropWhile C.isSpace s1
return (n, s2)
main = do [method] <- getArgs
let reader = case method of
"string" -> readString
"text" -> readText
"char8" -> readChar8
"hand" -> readHand
print =<< sum' <$> reader
./Sum string <data1.txt # 54.3 secs
./Sum text <data1.txt # 2.29 secs
./Sum char8 <data1.txt # 1.34 secs
./Sum hand <data1.txt # 0.91 secs