Multithreading Julia-读取大型文件的并行性_Multithreading_Parallel Processing_Julia

Multithreading Julia-读取大型文件的并行性

multithreading parallel-processing julia

Multithreading Julia-读取大型文件的并行性,multithreading,parallel-processing,julia,Multithreading,Parallel Processing,Julia,在Julia v1.1中，假设我有一个非常大的文本文件（30GB），并且我希望并行（多线程）来读取每个hline，我该怎么做此代码试图在检查后执行此操作，但根本不起作用 open("pathtofile", "r") do file # Count number of lines in file seekend(file) fileSize = position(file) seekstart(file) # skip nseekchars first

在Julia v1.1中，假设我有一个非常大的文本文件（30GB），并且我希望并行（多线程）来读取每个hline，我该怎么做

此代码试图在检查后执行此操作，但根本不起作用

open("pathtofile", "r") do file
    # Count number of lines in file
    seekend(file)
    fileSize = position(file)
    seekstart(file)

    # skip nseekchars first characters of file
    seek(file, nseekchars)

    # progress bar, because it's a HUGE file
    p = Progress(fileSize, 1, "Reading file...", 40)
    Threads.@threads for ln in eachline(file)
        # do something on ln
        u, v = map(x->parse(UInt32, x), split(ln))
        .... # other interesting things
        update!(p, position(file))
    end    
end

注1：您需要使用进度计

（我希望我的代码在读取文件时显示进度条）
注2:nseekhars是一个整数，是我想在文件开头跳过的字符数
注3：如果没有线程，代码可以工作，但不能并行。@Threads
宏位于for循环旁边，以获得最大的I/O性能：
并行化硬件-即使用磁盘阵列而不是单个驱动器。尝试搜索raid性能，以获得许多优秀的解释（或提出单独的问题）
使用Julia机制
一旦有了内存映射，就并行地进行处理。注意线程的错误（取决于您的实际场景）
读取文件不是CPU密集型的，因此并行化不会导致性能提高。将其存储在条带磁盘阵列上会更好。@ııı谢谢，我不知道条带磁盘阵列，这就是你说的吗？不幸的是，julia v1.1似乎没有可用的块。我想他说的是raid 0或similar@sascha好吧，这对我来说是很新的，朱莉娅身上有什么与此相关的东西吗？
s = open("my_file.txt","r")
using Mmap
a = Mmap.mmap(s)