Dataframe 获取响应dropna的julia数据帧中的计算NA列值_Dataframe_Julia

Dataframe 获取响应dropna的julia数据帧中的计算NA列值

dataframe julia

Dataframe 获取响应dropna的julia数据帧中的计算NA列值,dataframe,julia,Dataframe,Julia,我试图使用NA来表示给定数据帧“行”的计算值没有意义（或者可能无法计算）。如何获得一个包含计算的NAs的列，该列仍然响应dropna 例如： using DataFrames df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3]) # A value of 0 in column B should yield a foo of NA function foo(d) if d[:B] == 0 return NA

我试图使用

NA

来表示给定数据帧“行”的计算值没有意义（或者可能无法计算）。如何获得一个包含计算的

NA

s的列，该列仍然响应

dropna

例如：

using DataFrames

df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B should yield a foo of NA
function foo(d)
  if d[:B] == 0
    return NA
  end
  return d[:B] ./ d[:C] # vectorized to work with `by`
end

# What I'm looking for is something equivalent to this list
# comprehension, but that returns a DataFrame or DataArray
# since normal Arrays don't respond to `dropna`

comprehension = [foo(frame) for frame in eachrow(df)]

这有点棘手，因为数据帧行是笨拙的对象。例如，我认为这是完全合理的：

using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B should yield a foo of NA
function foo(d)
    if d[:B] == 0
    return NA
  end
  return d[:B] / d[:C] # vectorized to work with `by`
end
comp = DataArray(Float64,4)
map!(r->foo(r), eachrow(df))

但这导致了

`map!` has no method matching map!(::Function, ::DFRowIterator{DataFrame})

3x4 DataFrame
| Row | A | B | C | x1       |
|-----|---|---|---|----------|
| 1   | 1 | 1 | 5 | 0.2      |
| 2   | 3 | 2 | 3 | 0.666667 |
| 3   | 4 | 3 | 3 | 1.0      |

但是，如果您只想通过执行一个不总是返回行的

，则可以执行以下操作：
using DataFrames
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B returns an empty array
function foo(d)
    if d[1,:B] == 0
        return []
  end
    return d[1,:B] / d[1,:C] #Plan on only getting a single row in the by
end

by(df, [:A,:B,:C]) do d
    foo(d)
end

导致
`map!` has no method matching map!(::Function, ::DFRowIterator{DataFrame})

3x4 DataFrame
| Row | A | B | C | x1       |
|-----|---|---|---|----------|
| 1   | 1 | 1 | 5 | 0.2      |
| 2   | 3 | 2 | 3 | 0.666667 |
| 3   | 4 | 3 | 3 | 1.0      |

一个选项是扩展Base.convert
和DataArrays.dropna
，以便dropna
可以处理正常的向量
s：
using DataFrames

function Base.convert{T}(::Type{DataArray}, v::Vector{T})
  da = DataArray(T[],Bool[])
  for val in v
    push!(da, val)
  end
  return da
end

function DataArrays.dropna(v::Vector)
  return dropna(convert(DataArray,v))
end

现在，示例应按预期工作：
df = DataFrame(A = 1:4, B = [1, 0, 2, 3], C = [5, 4, 3, 3])

# A value of 0 in column B should yield a foo of NA
function foo(d)
  if d[:B] == 0
    return NA
  end
  return d[:B] / d[:C]
end

comprehension = [foo(frame) for frame in eachrow(df)]  

dropna(comprehension) #=> Array{Any,1}: [0.2, 0.667, 1.]

即使没有扩展的dropna
，扩展的convert
也允许将理解作为新的数据数组列插入到数据帧中，保留NA
s及其适当的删除行为：
conv = convert(DataArray, comprehension)
insert!(df, size(df, 2) + 1, conv, :foo)
#=> 4x4 DataFrame
#  | Row | A | B | C | foo      |
#  |-----|---|---|---|----------|
#  | 1   | 1 | 1 | 5 | 0.2      |
#  | 2   | 2 | 0 | 4 | NA       |
#  | 3   | 3 | 2 | 3 | 0.666667 |
#  | 4   | 4 | 3 | 3 | 1.0      |

typeof(df[:foo]) #=> DataArray{Any,1} (constructor with 1 method)
dropna(df[:foo]) #=> Array{Any,1}: [0.2, 0.667, 1.]

你可以这样做
using DataFramesMeta
result = @with(df, map(foo, :B, :C)) 

#=> DataArray{Any,1}: [0.2, NA, 0.667, 1.0]

…如果foo
可以重新写入以引用单个值，而不是整个数据帧

：

function foo(b, c)
  if b == 0
    return NA
  end
  return b / c
end

类似地，如果需要包含新列的新数据帧，请使用

@transform

：

tdf = @transform(df, foo = map(foo, :B, :C))
#=>4x4 DataFrame
#  | Row | A | B | C | foo      |
#  |-----|---|---|---|----------|
#  | 1   | 1 | 1 | 5 | 0.2      |
#  | 2   | 2 | 0 | 4 | NA       |
#  | 3   | 3 | 2 | 3 | 0.666667 |
#  | 4   | 4 | 3 | 3 | 1.0      |

在许多情况下，删除行的新数据帧看起来是一种有用的策略，但它不符合我在结果中保留

NA

s的愿望。我希望将

NA

s保留在列中，这样它们就可以对涉及该列的后续元素计算造成毒害，但仍然能够在适当的时候删除它们（例如计算列平均值）。