Julia 解析字符串数组_Julia_Text Parsing

Julia 解析字符串数组

julia

Julia 解析字符串数组,julia,text-parsing,Julia,Text Parsing,我有一个一维字符串数组（数组{String，1}），它描述了一个浮点矩阵（见下文）。我需要解析这个矩阵。有什么巧妙的建议吗朱莉娅1.5 马科斯是的，我确实从文件中读到了这个。我不想使用CSV读取整个文件，因为我想保留使用内存I/O读取整个文件的选项，我认为CSV没有这个选项。另外，我有一些复杂的行，包括字符串和数字，以及我需要解析的字符串和字符串，这就排除了分隔符文件。这些列由两个空格分隔 julia> lines[24+member_total:idx-1] 49-element

我有一个一维字符串数组（数组{String，1}），它描述了一个浮点矩阵（见下文）。我需要解析这个矩阵。有什么巧妙的建议吗

朱莉娅1.5
马科斯

是的，我确实从文件中读到了这个。我不想使用CSV读取整个文件，因为我想保留使用内存I/O读取整个文件的选项，我认为CSV没有这个选项。另外，我有一些复杂的行，包括字符串和数字，以及我需要解析的字符串和字符串，这就排除了分隔符文件。这些列由两个空格分隔

julia> lines[24+member_total:idx-1]
49-element Array{String,1}:
 "0.0000000E+00  0.0000000E+00  0.0000000E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  1.9987500E-01  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  1.1998650E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  2.1998550E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  3.1998450E+00  1.3308000E+01"
 "0.0000000E+00  0.0000000E+00  4.1998350E+00  1.3308000E+01"
 ⋮
 "0.0000000E+00  0.0000000E+00  5.9699895E+01  1.4000000E-01"
 "0.0000000E+00  0.0000000E+00  6.0199890E+01  1.0100000E-01"
 "0.0000000E+00  0.0000000E+00  6.0699885E+01  6.2000000E-02"
 "0.0000000E+00  0.0000000E+00  6.1199880E+01  2.3000000E-02"
 "0.0000000E+00  0.0000000E+00  6.1500000E+01  0.0000000E+00"

我到处找工作。不是最狡猾的事，但它很有效

function rmspaces(line)
    line = replace(line, "\t" => " ")
    # println("line: ", line)
    while occursin("  ", line)
        line = replace(line, "  "=>" ")
        # println("line: ", line)
    end

    return line
end

function readmatrix(lines, numcolumns::Int64; type=Float64)
    #Remove the spaces to one
    for i=1:length(lines)
        lines[i] = rmspaces(lines[i])
    end

    matrix = zeros(length(lines), numcolumns)

    for i=1:length(lines)
        idx = 1 # set the initial stop at the beginning
        spot = 1
        for j=1:length(lines[i])
            if lines[i][j]==' ' && j>1 #Stops at spaces
                number = parse(type,lines[i][idx:j]) #from the last stop to this one
                idx = j #Set this stop in memory
                matrix[i,spot] = number
                spot += 1
            end
        end
        if spot<numcolumns+1 #If there isn't a space after the last number,
            #we need to attach the last number in every row. If the last number
            #was appended, then the spot will be increased to be more than the number
            #of columns.
            number = parse(type, lines[i][idx:end])
            matrix[i,spot] = number
        end
    end
    return matrix
end

函数空间（行）
行=替换（行“\t”=>”）
#println（“行：”，行）
当占用时（“，行）
行=替换（行“=>”）
#println（“行：”，行）
结束
回程线
结束
函数readmatrix（行，numcolumns:：Int64；类型=Float64）
#将空格移到1
对于i=1：长度（线）
行[i]=rmspaces（行[i]）
结束
矩阵=零（长度（线），numcolumns）
对于i=1：长度（线）
idx=1#在开始时设置初始停止
spot=1
对于j=1：长度（线[i]）
如果行[i][j]=''&&j>1在空格处停止
number=parse（类型，行[i][idx:j]）#从最后一站到这一站
idx=j#在内存中设置此停止
矩阵[i，点]=数
spot+=1
结束
结束
如果点
3×4数组{Float64,2}:
0.0  0.0  0.0       13.308
0.0  0.0  0.199875  13.308
0.0  0.0  1.19986   13.308
我强烈反对重新发明轮子和使用定制的解析器，因为这些解决方案在生产中具有实际的健壮性
如果文件位于单个字符串中，请使用：
using DelimitedFiles
readdlm(IOBuffer(strs))

如果您的文件作为字符串的向量
使用
cat(readdlm.(IOBuffer.(strsa))...,dims=1)

最后，将内存映射与CSV一起使用不会产生冲突：
using Mmap

s = open("d.txt") # d.txt contains your lines
                  # if you want to read & wrtie use "w+" option
 
m = Mmap.mmap(s, Vector{UInt8}) # memory mapping of your file

readdlm(IOBuffer(m))


同时，您可以始终将流设置为开始并读取数据，而不考虑内存映射：
seek(s,0)
readdlm(s)
seek(s,0) # reset the stream

我也是。谢谢，这是一个非常巧妙的解决方案。我没有想到使用IOBuffer来输入CSV或readdlm。对于像我这样的人来说，strs是包含文件的字符串，stra是包含文件的字符串数组@我有几个问题要问你，因为很明显你是朱莉娅的专家。你有没有一种巧妙的方法来解析有数字和字母的字符串（我只想要数字）？例如：“49 kp_total-关键点的总数（-）[必须至少为3]”
您知道一种好方法，可以获取我读入的这些值（矩阵、字符串和标量），并将它们与其他字符串一起写入文件吗（标题、变量名和解释，如我上面的问题所示。我读入一个字符串，其中包含我想要的标量值、变量名，然后是一个解释。如果我更改标量值，我将把它写回文件。）？对于必须从Strings中提取此类数据的问题1，我会尝试使用正则表达式，或者在更复杂的情况下对文本进行语义分析。对于问题2，最自然的答案是使用JSON格式。
seek(s,0)
readdlm(s)
seek(s,0) # reset the stream