Io 鸡肉方案读线时间太长_Io_Racket_Chicken Scheme

Io 鸡肉方案读线时间太长

io racket

Io 鸡肉方案读线时间太长,io,racket,chicken-scheme,Io,Racket,Chicken Scheme,有没有快速阅读和标记大型语料库的方法？我正在尝试读取一个中等大小的文本文件，编译后的chick似乎只是挂起（我在大约2分钟后终止了进程），而比如说，Racket的性能可以接受（大约20秒）。我能做些什么才能在鸡肉上得到同样的表现？这是我用来读取文件的代码。欢迎所有建议 (define *corpus* (call-with-input-file "largeish_file.txt" (lambda (input-file) (let loop ([l

有没有快速阅读和标记大型语料库的方法？我正在尝试读取一个中等大小的文本文件，编译后的chick似乎只是挂起（我在大约2分钟后终止了进程），而比如说，Racket的性能可以接受（大约20秒）。我能做些什么才能在鸡肉上得到同样的表现？这是我用来读取文件的代码。欢迎所有建议

(define *corpus*
  (call-with-input-file "largeish_file.txt"
    (lambda (input-file)
      (let loop ([line (read-line input-file)]
                 [tokens '()])
        (if (eof-object? line)
            tokens
            (loop (read-line input-file)
                  (append tokens (string-split line))))))))

尝试使用较大的初始堆运行它：

/prog-：hi100M

该程序进行了大量的分配，这意味着堆需要进行大量的大小调整，这会触发大量的主要gc（这些gc非常昂贵）

启用调试输出时，可以看到堆的大小发生变化：

/prog-：d

如果要查看GC输出，请尝试：

/prog-:g

尝试使用较大的初始堆运行它：

/prog-：hi100M

该程序进行了大量的分配，这意味着堆需要进行大量的大小调整，这会触发大量的主要gc（这些gc非常昂贵）

启用调试输出时，可以看到堆的大小发生变化：

/prog-：d

如果要查看GC输出，请尝试：

/prog-:g

如果您能够一次性将整个文件读入内存，您可以使用类似以下代码的方法，这应该会更快：

（让循环（（行）（从文件“largeish_file.txt”输入）
（读行）
（如果（空？行）
'()
（附加（字符串拆分（轿厢线））
（环路（cdr线路(()()))）

下面是一些快速基准测试代码：

(import (chicken io)
        (chicken string))

;; Warm-up
(with-input-from-file "largeish_file.txt" read-lines)

(time
 (with-output-to-file "a.out"
   (lambda ()
     (display
      (call-with-input-file "largeish_file.txt"
        (lambda (input-file)
          (let loop ([line (read-line input-file)]
                     [tokens '()])
            (if (eof-object? line)
                tokens
                (loop (read-line input-file)
                      (append tokens (string-split line)))))))))))

(time
 (with-output-to-file "b.out"
   (lambda ()
     (display
      (let loop ((lines (with-input-from-file "largeish_file.txt"
                          read-lines)))
        (if (null? lines)
            '()
            (append (string-split (car lines))
                    (loop (cdr lines)))))))))

以下是我的系统上的结果：

$ csc bench.scm && ./bench
28.629s CPU time, 13.759s GC time (major), 68772/275 mutations (total/tracked), 4402/14196 GCs (major/minor), maximum live heap: 4.63 MiB
0.077s CPU time, 0.033s GC time (major), 68778/292 mutations (total/tracked), 10/356 GCs (major/minor), maximum live heap: 3.23 MiB

确保我们从两个代码片段中得到相同的结果：

$ cmp a.out b.out && echo They contain the same data
They contain the same data

largeish_file.txt

是通过对一个~100KB的系统日志文件进行分类生成的，直到它有~10000行（提到这一点，以便您了解输入文件的配置文件）：

我在Debian系统上使用CHICKEN 5.2.0获得的结果。

如果您能够一次性将整个文件读入内存，您可以使用类似以下代码的代码，这应该会更快：

（让循环（（行）（从文件“largeish_file.txt”输入）
（读行）
（如果（空？行）
'()
（附加（字符串拆分（轿厢线））
（环路（cdr线路(()()))）

下面是一些快速基准测试代码：

(import (chicken io)
        (chicken string))

;; Warm-up
(with-input-from-file "largeish_file.txt" read-lines)

(time
 (with-output-to-file "a.out"
   (lambda ()
     (display
      (call-with-input-file "largeish_file.txt"
        (lambda (input-file)
          (let loop ([line (read-line input-file)]
                     [tokens '()])
            (if (eof-object? line)
                tokens
                (loop (read-line input-file)
                      (append tokens (string-split line)))))))))))

(time
 (with-output-to-file "b.out"
   (lambda ()
     (display
      (let loop ((lines (with-input-from-file "largeish_file.txt"
                          read-lines)))
        (if (null? lines)
            '()
            (append (string-split (car lines))
                    (loop (cdr lines)))))))))

以下是我的系统上的结果：

$ csc bench.scm && ./bench
28.629s CPU time, 13.759s GC time (major), 68772/275 mutations (total/tracked), 4402/14196 GCs (major/minor), maximum live heap: 4.63 MiB
0.077s CPU time, 0.033s GC time (major), 68778/292 mutations (total/tracked), 10/356 GCs (major/minor), maximum live heap: 3.23 MiB

确保我们从两个代码片段中得到相同的结果：

$ cmp a.out b.out && echo They contain the same data
They contain the same data

largeish_file.txt

是通过对一个~100KB的系统日志文件进行分类生成的，直到它有~10000行（提到这一点，以便您了解输入文件的配置文件）：

我在Debian系统上使用CHICKEN 5.2.0得到的结果。

调优其他GC参数也值得（请尝试-：？获取有关可用运行时选项的帮助；它们也是，顺便说一句）。调优其他GC参数也值得（请尝试-：？获取有关可用运行时选项的帮助；它们也是，顺便说一句）。