Julia 朱莉娅对猫的命令很慢_Julia

Julia 朱莉娅对猫的命令很慢

julia

Julia 朱莉娅对猫的命令很慢,julia,Julia,我想看看julia语言，所以我写了一个小脚本来导入我正在使用的数据集。但是，当我运行并分析脚本时，发现它比R中的类似脚本慢得多。当我进行分析时，它告诉我所有cat命令的性能都很差这些文件如下所示： # #Metadata # Identifier1 data_string1 Identifier2 data_string2 Identifier3 data_string3 Identifier4 data_string4 // 我主要想获取数据字符串，并将它们拆分为单个字符的矩阵。这

我想看看julia语言，所以我写了一个小脚本来导入我正在使用的数据集。但是，当我运行并分析脚本时，发现它比R中的类似脚本慢得多。当我进行分析时，它告诉我所有cat命令的性能都很差

这些文件如下所示：

#
#Metadata
#

Identifier1 data_string1
Identifier2 data_string2
Identifier3 data_string3
Identifier4 data_string4

//

我主要想获取数据字符串，并将它们拆分为单个字符的矩阵。这是一个简单的代码示例：

function loadfile()
  f = open("/file1")
  first=true
  m = Array(Any, 1,0)

  for ln in eachline(f)

    if ln[1] != '#' && ln[1] != '\n' && ln[1] != '/'
      s = split(ln[1:end-1])
      s = split(s[2],"")

      if first
        m = reshape(s,1,length(s))
        first = false
      else
        s = reshape(s,1,length(s))
        println(size(m))
        println(size(s))
        m = vcat(m, s)
      end

    end
  end
end

你知道为什么茱莉亚在使用cat命令时动作缓慢，或者我如何能以不同的方式执行吗

谢谢你的建议

像那样使用

cat

速度很慢，因为它需要大量内存分配。每次执行

vcat

时，我们都会分配一个全新的数组

，它与旧的

基本相同。以下是我如何以一种更具朱利安风格的方式重写您的代码，其中

仅在末尾创建：

function loadfile2()
  f = open("./sotest.txt","r")
  first = true
  lines = Any[]

  for ln in eachline(f)
    if ln[1] == '#' || ln[1] == '\n' || ln[1] == '/'
      continue
    end

    data_str = split(ln[1:end-1]," ")[2]
    data_chars = split(data_str,"")
    # Can make even faster (2x in my tests) with
    # data_chars = [data_str[i] for i in 1:length(data_str)]
    # But this inherently assumes ASCII data
    push!(lines, data_chars)
  end
  m = hcat(lines...)'  # Stick column vectors together then transpose
end

我制作了一个10000行版本的示例数据，并发现以下性能：

Old version:
elapsed time: 3.937826405 seconds (3900659448 bytes allocated, 43.81% gc time)
elapsed time: 3.581752309 seconds (3900645648 bytes allocated, 36.02% gc time)
elapsed time: 3.57753696 seconds (3900645648 bytes allocated, 37.52% gc time)
New version:
elapsed time: 0.010351067 seconds (11568448 bytes allocated)
elapsed time: 0.011136188 seconds (11568448 bytes allocated)
elapsed time: 0.010654002 seconds (11568448 bytes allocated)

谢谢！！我肯定会使用您的

[data\u str[I]for I in 1:length（data\u str）]

（作为其全部ASCII数据）。来自R，我仍然有点害怕使用for循环，所以我从来没有想到过这样的解决方案……：）