Haskell 哈斯克尔能在数字母方面胜过C吗?
扰流板:是的。见下文 试图优化一个字母计数器,使其与C匹配。我已经努力使其达到2倍的赤字Haskell 哈斯克尔能在数字母方面胜过C吗?,haskell,Haskell,扰流板:是的。见下文 试图优化一个字母计数器,使其与C匹配。我已经努力使其达到2倍的赤字 letterCount :: B.ByteString -> V.Vector Int letterCount bs = V.accumulate (\a _ -> a + 1) (V.replicate 256 0) letters1 where len = B.length bs letters1 = V.gener
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.accumulate
(\a _ -> a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))
一些注意事项:
Data.Vector
更改为Data.Vector.unbox
之前,速度非常慢。为什么呢生成
()
军队可能会也可能不会真正建立起来import qualified Data.ByteString as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.accumulate
(\a _ -> a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len (\i -> (fromIntegral $! B.index bs i, ()))
printCounts :: V.Vector Int -> IO ()
printCounts cs =
mapM_
(uncurry $ printf "%c: %d\n")
(zip (map toEnum [0..255] :: String) (V.toList cs))
main :: IO ()
main = do
filename <- fmap head getArgs
f <- B.readFile filename
let counts = letterCount f
printCounts counts
更新
我认为性能上限取决于这个bug:。我并不认为不可能进一步优化,但不太可能接近C,因为runST会带来一些开销
此外,根据@Zeta的评论修正了C代码。第1点。装箱向量是指向可能产生
Int
的未赋值表达式的指针数组。非固定向量只是一个整数数组。它绝对是严格的,这意味着更少的内存分配/垃圾收集,而且它可能有更好的CPU缓存行为。这就是为什么在第一个地方提供未装箱的版本
第4点。我的理解是,整数类型之间的转换在运行时是不可操作的。基本上,Int
和Word8
存储相同;唯一的区别是(+)
和类似的实现方式
另外,据我所知,空构造函数,如
()
(以及True
,False
,Nothing
,…)在所有实例之间共享。因此,您不是在“创建”一大群()
值。如果删除边界检查,速度会更快一些:
import qualified Data.ByteString as B
import qualified Data.ByteString.Unsafe as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.unsafeAccumulate
(\a _ -> a + 1)
(V.replicate 256 0)
letters1
where
len = B.length bs
letters1 = V.generate len (\i -> (fromIntegral $! B.unsafeIndex bs i, ()))
printCounts :: V.Vector Int -> IO ()
printCounts cs =
mapM_
(uncurry $ printf "%c: %d\n")
(zip (map toEnum [0..255] :: String) (V.toList cs))
main :: IO ()
main = do
filename <- fmap head getArgs
f <- B.readFile filename
let counts = letterCount f
printCounts counts
导入符合条件的数据.ByteString作为B
将限定数据.ByteString.Unsafe作为B导入
导入符合条件的数据.Vector.unbox为V
导入系统。环境
导入文本.Printf
字母计数::B.ByteString->V.Vector Int
字母计数bs=
五、不安全积累
(\a\u->a+1)
(V.0)
字母1
哪里
len=B.长度bs
letters1=V.generate len(\i->(from integral$!B.unsafeIndex bs i,())
打印计数::V.向量Int->IO()
打印计数=
mapM_
(未修订的$printf“%c:%d\n”)
(zip(映射到数字[0..255]::字符串)(V.toList cs))
main::IO()
main=do
文件名是。如果您使用-fllvm
编译,那么Haskell将在用户
时间内匹配C。令人惊讶的是,如果您切换到Lazy Bytestrings,Haskell版本将在Real
时间上以微小但显著的优势击败C版本
import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.unsafeAccumulate
(\a _ -> a + 1)
(V.replicate 256 0)
(parse bs)
parse :: B.ByteString -> V.Vector (Int, ())
parse = V.unfoldr step
where
step s = if B.null s
then Nothing
else Just ((fromIntegral . fromEnum $ B.head s, ()), B.tail s)
{-# INLINE parse #-}
printCounts :: V.Vector Int -> IO ()
printCounts cs =
mapM_
(uncurry $ printf "%c: %d\n")
(zip (map toEnum [0..255] :: String) (V.toList cs))
main :: IO ()
main = do
filename <- fmap head getArgs
f <- B.readFile filename
let counts = letterCount f
printCounts counts
所以,Vector+ByteString.Lazy+LLVM>C。我喜欢它
更新
为了公平起见,我更新了C代码以使用单个缓冲区,这样可以避免预先进行大量分配(或任何分配),并且对缓存更加友好。现在,Haskell和C代码没有显示出明显的差异,它们在运行时都是大约190ms,以三取一的方式与150M的大型输入文件进行比较:
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <stdlib.h>
#define CHUNK 16384
int letcnt [256];
int* letter_count(unsigned char *s, unsigned int len)
{
int i;
for(i = 0; i < len; i++){
letcnt[*(s + i)]++;
}
return (letcnt);
}
int* letter_count_chunks(unsigned int len, FILE* f)
{
int i;
unsigned char chunk [CHUNK];
memset(letcnt, 0, sizeof(letcnt));
for(i = 0; i < len - CHUNK; i+= CHUNK) {
fread(chunk, CHUNK, 1, f);
letter_count(chunk, CHUNK);
}
fread(chunk, len - i, 1, f);
letter_count(chunk, len - i);
return letcnt;
}
void print_counts() {
int i;
for(i = 0; i < 256; i++) {
printf("'%c': %d\n", (unsigned char) i, letcnt[i]);
}
}
// st_size
int main(int argc, char **argv)
{
assert(argc == 2);
FILE* f = fopen(argv[1], "r");
struct stat st;
stat(argv[1], &st);
off_t len = st.st_size;
letter_count_chunks(len, f);
fclose(f);
print_counts();
return 0;
}
#包括
#包括
#包括
#包括
#包括
#定义块16384
int-letcnt[256];
整数*字母计数(无符号字符*s,无符号整数len)
{
int i;
对于(i=0;i
在我的(慢速)MacBook Air上,当我以完全优化的方式编译它们时(C为-O3
,Haskell为-O2
),它们是可比的:C为0.016,Haskell为0.018。不是2倍的赤字。这不应该继续CodeReview.SE?IMHO,“你能做得更好吗”是一个或一个@Zeta编程难题吗?这么多的新事物。这些时间太少,无法进行任何明智的比较。例如,Haskell RTS保证有一个比C长得多的启动时间。这很有趣。为什么会有人想要一个装箱的向量呢?因为它可能是惰性的,你可能会想要它。(与Haskell通常懒惰的原因相同。)@MichaelFox,未装箱的东西不可能真的是多态的。如果文件letter\u count
和letter\u count\u chunks
可能应该是无效的,因为您不使用返回值。根据编译器的不同,在letter_count函数中直接使用指针也可以获得更好的性能,比如for(unsigned char*e=s+len;s)
import qualified Data.ByteString.Lazy.Char8 as B
import qualified Data.Vector.Unboxed as V
import System.Environment
import Text.Printf
letterCount :: B.ByteString -> V.Vector Int
letterCount bs =
V.unsafeAccumulate
(\a _ -> a + 1)
(V.replicate 256 0)
(parse bs)
parse :: B.ByteString -> V.Vector (Int, ())
parse = V.unfoldr step
where
step s = if B.null s
then Nothing
else Just ((fromIntegral . fromEnum $ B.head s, ()), B.tail s)
{-# INLINE parse #-}
printCounts :: V.Vector Int -> IO ()
printCounts cs =
mapM_
(uncurry $ printf "%c: %d\n")
(zip (map toEnum [0..255] :: String) (V.toList cs))
main :: IO ()
main = do
filename <- fmap head getArgs
f <- B.readFile filename
let counts = letterCount f
printCounts counts
ghc -O2 -fllvm letterCount.hs
#include <assert.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <stdlib.h>
#define CHUNK 16384
int letcnt [256];
int* letter_count(unsigned char *s, unsigned int len)
{
int i;
for(i = 0; i < len; i++){
letcnt[*(s + i)]++;
}
return (letcnt);
}
int* letter_count_chunks(unsigned int len, FILE* f)
{
int i;
unsigned char chunk [CHUNK];
memset(letcnt, 0, sizeof(letcnt));
for(i = 0; i < len - CHUNK; i+= CHUNK) {
fread(chunk, CHUNK, 1, f);
letter_count(chunk, CHUNK);
}
fread(chunk, len - i, 1, f);
letter_count(chunk, len - i);
return letcnt;
}
void print_counts() {
int i;
for(i = 0; i < 256; i++) {
printf("'%c': %d\n", (unsigned char) i, letcnt[i]);
}
}
// st_size
int main(int argc, char **argv)
{
assert(argc == 2);
FILE* f = fopen(argv[1], "r");
struct stat st;
stat(argv[1], &st);
off_t len = st.st_size;
letter_count_chunks(len, f);
fclose(f);
print_counts();
return 0;
}