Julia 以有效的方式将缺少值的数据帧列从字符串值转换为浮点值

Julia 以有效的方式将缺少值的数据帧列从字符串值转换为浮点值,julia,Julia,我知道将一列字符串值转换为浮点值的方法: df = DataFrame(:a => ["2,2", "3,3"]) df.a = parse.(Float64, replace.(df.a, (','=>'.',))) 但是如何转换缺少值的列呢 df = DataFrame(:a => ["2,2", missing] 我尝试了passmissing,但似乎没有优化时间和内存: julia> @time p

我知道将一列字符串值转换为浮点值的方法:

df = DataFrame(:a => ["2,2", "3,3"])
df.a = parse.(Float64, replace.(df.a,  (','=>'.',)))
但是如何转换缺少值的列呢

df = DataFrame(:a => ["2,2", missing]
我尝试了
passmissing
,但似乎没有优化时间和内存:

julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
  0.140616 seconds (143.84 k allocations: 7.424 MiB)
2-element Array{Float64,1}:
 2.2
 3.3
julia> f(x) = passmissing(v->parse(Float64, replace(v, (','=>'.'))))(x)
f (generic function with 1 method)

julia> @time f.(df.a)
  0.031789 seconds (108.92 k allocations: 5.402 MiB)
2-element Array{Float64,1}:
 2.2
 3.3

julia> @time f.(df.a)
  0.000047 seconds (11 allocations: 576 bytes)
2-element Array{Float64,1}:
 2.2
 3.3
它是优化的:

julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
  0.140616 seconds (143.84 k allocations: 7.424 MiB)
2-element Array{Float64,1}:
 2.2
 3.3
julia> f(x) = passmissing(v->parse(Float64, replace(v, (','=>'.'))))(x)
f (generic function with 1 method)

julia> @time f.(df.a)
  0.031789 seconds (108.92 k allocations: 5.402 MiB)
2-element Array{Float64,1}:
 2.2
 3.3

julia> @time f.(df.a)
  0.000047 seconds (11 allocations: 576 bytes)
2-element Array{Float64,1}:
 2.2
 3.3
唯一的问题是,如果每次要进行替换时都定义一个新的匿名函数,则会重新触发编译

如果您每次都必须使用新的匿名函数,那么您可以使用理解,因为它在编译成本方面比广播要轻:

julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
  0.051069 seconds (143.81 k allocations: 7.420 MiB)
2-element Array{Float64,1}:
 2.2
 3.3

julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
  0.072161 seconds (143.81 k allocations: 7.421 MiB)
2-element Array{Float64,1}:
 2.2
 3.3

julia> @time [passmissing(v->parse(Float64, replace(v, (','=>'.'))))(v) for v in df.a]
  0.039240 seconds (76.63 k allocations: 3.960 MiB)
2-element Array{Float64,1}:
 2.2
 3.3

julia> @time [passmissing(v->parse(Float64, replace(v, (','=>'.'))))(v) for v in df.a]
  0.051859 seconds (76.63 k allocations: 3.961 MiB, 25.99% gc time)
2-element Array{Float64,1}:
 2.2
 3.3
(您可以通过分析分配量看到这一点)