Julia 以有效的方式将缺少值的数据帧列从字符串值转换为浮点值
我知道将一列字符串值转换为浮点值的方法:Julia 以有效的方式将缺少值的数据帧列从字符串值转换为浮点值,julia,Julia,我知道将一列字符串值转换为浮点值的方法: df = DataFrame(:a => ["2,2", "3,3"]) df.a = parse.(Float64, replace.(df.a, (','=>'.',))) 但是如何转换缺少值的列呢 df = DataFrame(:a => ["2,2", missing] 我尝试了passmissing,但似乎没有优化时间和内存: julia> @time p
df = DataFrame(:a => ["2,2", "3,3"])
df.a = parse.(Float64, replace.(df.a, (','=>'.',)))
但是如何转换缺少值的列呢
df = DataFrame(:a => ["2,2", missing]
我尝试了passmissing
,但似乎没有优化时间和内存:
julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
0.140616 seconds (143.84 k allocations: 7.424 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> f(x) = passmissing(v->parse(Float64, replace(v, (','=>'.'))))(x)
f (generic function with 1 method)
julia> @time f.(df.a)
0.031789 seconds (108.92 k allocations: 5.402 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> @time f.(df.a)
0.000047 seconds (11 allocations: 576 bytes)
2-element Array{Float64,1}:
2.2
3.3
它是优化的:
julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
0.140616 seconds (143.84 k allocations: 7.424 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> f(x) = passmissing(v->parse(Float64, replace(v, (','=>'.'))))(x)
f (generic function with 1 method)
julia> @time f.(df.a)
0.031789 seconds (108.92 k allocations: 5.402 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> @time f.(df.a)
0.000047 seconds (11 allocations: 576 bytes)
2-element Array{Float64,1}:
2.2
3.3
唯一的问题是,如果每次要进行替换时都定义一个新的匿名函数,则会重新触发编译
如果您每次都必须使用新的匿名函数,那么您可以使用理解,因为它在编译成本方面比广播要轻:
julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
0.051069 seconds (143.81 k allocations: 7.420 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> @time passmissing(v->parse(Float64, replace(v, (','=>'.')))).(df.a)
0.072161 seconds (143.81 k allocations: 7.421 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> @time [passmissing(v->parse(Float64, replace(v, (','=>'.'))))(v) for v in df.a]
0.039240 seconds (76.63 k allocations: 3.960 MiB)
2-element Array{Float64,1}:
2.2
3.3
julia> @time [passmissing(v->parse(Float64, replace(v, (','=>'.'))))(v) for v in df.a]
0.051859 seconds (76.63 k allocations: 3.961 MiB, 25.99% gc time)
2-element Array{Float64,1}:
2.2
3.3
(您可以通过分析分配量看到这一点)