Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
检查Julia中Dataframe多列中的元素_Dataframe_Loops_Julia - Fatal编程技术网

检查Julia中Dataframe多列中的元素

检查Julia中Dataframe多列中的元素,dataframe,loops,julia,Dataframe,Loops,Julia,我有一个关于在DataFrame上操作时在任何循环中使用条件的问题 比如说,, 我有一个数据帧 df: a b c 1 2 5 3 4 3 2 1 7 6 3 6 5 1 9 我试图编写一个循环,条件是每次检查两个col(a和b),如果值I在任一列或两列中都可用,那么它应该从c列中获取值并将其存储在数组中 使用它,我可以在以后执行统计操作,比如查找数组的平均值 我已经为此任务编写了一个简化的代码段: for i in 1:5 result1 = Float64[] result2

我有一个关于在DataFrame上操作时在任何循环中使用条件的问题

比如说,, 我有一个数据帧

df:

a b c

1 2 5
3 4 3
2 1 7
6 3 6
5 1 9
我试图编写一个循环,条件是每次检查两个col(
a和b
),如果值
I
在任一列或两列中都可用,那么它应该从
c列中获取值并将其存储在数组中

使用它,我可以在以后执行统计操作,比如查找数组的平均值

我已经为此任务编写了一个简化的代码段:

for i in 1:5
  result1 = Float64[]
  result2 = Float64[]
  if (df[:, :a] = i) 
      push!(result1, df[:, :c])
  elseif (df[:, :b] = i)
      push!(result2, df[:, :c])
  end

  unique!(result1)
  unique!(result2)

  result = vcat(result1, result2)

  global mean_val = mean(result)
end
此处,
i
值的范围为1到5,对于每个值,将检查
a列和b列是否存在,如果该值存在,则应将
c列中的值推送到相应的结果数组中

我尝试使用社区的其他建议,如:

代码示例1:


for i in 1:5
  mean_val = mean(df[:, :c] for i in ("a", "b")
end
代码示例2:

for i in 1:5
  df.row = axes(df, 1)
  mean_val = mean((filter(x->x[:a] == i || x[:b] == i ,df))[:c])
end
但是,这些都不起作用,并返回所需的输出

请就我在代码中的错误提出建议。 另外,请建议是否有任何文档解释如何在语句中实现多个条件,以及如何访问数据帧元素以进行julia中的任何其他操作

提前感谢您

实现(我认为)您想要实现的第一种方法是使用获取数据帧的子集:

julia> using DataFrames
julia> df = DataFrame(a = rand(1:5, 10), b = rand(1:5, 10), c = rand(1:100, 10))
10×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     1      2     25
   2 │     5      4     72
   3 │     4      3     37
   4 │     4      3     46
   5 │     3      2     31
   6 │     3      5     43
   7 │     5      1     35
   8 │     5      2     54
   9 │     1      1     64
  10 │     1      4     57
然后,您可以计算所产生的过滤值的任何统计信息:

julia> using Statistics

julia> mean(filtered_c)
39.25

执行相同操作的另一种方法是使用筛选要保留的行:

julia> filtered_df = filter(row -> (row.a==3 || row.b==3), df)
4×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     4      3     37
   2 │     4      3     46
   3 │     3      2     31
   4 │     3      5     43

# This way of writing things is equivalent to the previous one, but
# might be more readable in cases where the condition you're checking
# is more complex
julia> filtered_df = filter(df) do row
           row.a == 3 || row.b == 3
       end
4×3 DataFrame
 Row │ a      b      c     
     │ Int64  Int64  Int64 
─────┼─────────────────────
   1 │     4      3     37
   2 │     4      3     46
   3 │     3      2     31
   4 │     3      5     43

julia> mean(filtered_df.c)
39.25

作为弗朗索瓦·费沃特(François Févotte)一个出色答案的一个小效率注释,它可以更快地做到:

julia> filter([:a, :b] => (a,b) -> a == 3 || b == 3, df, view=true)
4×3 SubDataFrame
 Row │ a      b      c
     │ Int64  Int64  Int64
─────┼─────────────────────
   1 │     3      5      1
   2 │     3      5      9
   3 │     4      3     74
   4 │     4      3     63
如果您有一个非常大的数据帧。这里有两个区别:

  • 我使用a
    [:a,:b]=>(a,b)->a==3 | | b==3
    synax,它是类型稳定的(因此它将更快地迭代行)
  • 我使用
    view=true
    生成源数据帧的视图,该视图分配的数据要少得多(对于非常大的数据帧可能很重要) 以下是一个较大数据帧上不同行子集设置选项的小示例:

    julia> df = DataFrame(a=rand(1:3, 10^8), b=rand(1:3, 10^8), c=rand(10^8));
    
    julia> function test()
               @time filter(row -> (row.a==3 || row.b==3), df)
               @time df[(df.a .== 3) .| (df.b .== 3), :]
               @time @view df[(df.a .== 3) .| (df.b .== 3), :]
               @time filter([:a, :b] => (a,b) -> a == 3 || b == 3, df)
               @time filter([:a, :b] => (a,b) -> a == 3 || b == 3, df, view=true)
               return nothing
           end
    test (generic function with 1 method)
    
    julia> test()
     19.912672 seconds (333.67 M allocations: 6.652 GiB, 5.71% gc time, 0.41% compilation time)
      1.152460 seconds (29 allocations: 1.667 GiB, 14.88% gc time)
      0.515334 seconds (15 allocations: 435.807 MiB, 40.49% gc time)
      1.066756 seconds (412.82 k allocations: 1.689 GiB, 5.56% gc time, 12.54% compilation time)
      0.646710 seconds (382.98 k allocations: 455.835 MiB, 31.27% gc time, 23.02% compilation time)
    
    julia> test()
     18.194791 seconds (333.34 M allocations: 6.635 GiB, 4.87% gc time)
      1.018816 seconds (29 allocations: 1.667 GiB, 15.34% gc time)
      0.469027 seconds (15 allocations: 435.807 MiB, 41.19% gc time)
      0.912572 seconds (30 allocations: 1.667 GiB, 5.32% gc time)
      0.480374 seconds (16 allocations: 435.807 MiB, 41.15% gc time)
    
    julia> df = DataFrame(a=rand(1:3, 10^8), b=rand(1:3, 10^8), c=rand(10^8));
    
    julia> function test()
               @time filter(row -> (row.a==3 || row.b==3), df)
               @time df[(df.a .== 3) .| (df.b .== 3), :]
               @time @view df[(df.a .== 3) .| (df.b .== 3), :]
               @time filter([:a, :b] => (a,b) -> a == 3 || b == 3, df)
               @time filter([:a, :b] => (a,b) -> a == 3 || b == 3, df, view=true)
               return nothing
           end
    test (generic function with 1 method)
    
    julia> test()
     19.912672 seconds (333.67 M allocations: 6.652 GiB, 5.71% gc time, 0.41% compilation time)
      1.152460 seconds (29 allocations: 1.667 GiB, 14.88% gc time)
      0.515334 seconds (15 allocations: 435.807 MiB, 40.49% gc time)
      1.066756 seconds (412.82 k allocations: 1.689 GiB, 5.56% gc time, 12.54% compilation time)
      0.646710 seconds (382.98 k allocations: 455.835 MiB, 31.27% gc time, 23.02% compilation time)
    
    julia> test()
     18.194791 seconds (333.34 M allocations: 6.635 GiB, 4.87% gc time)
      1.018816 seconds (29 allocations: 1.667 GiB, 15.34% gc time)
      0.469027 seconds (15 allocations: 435.807 MiB, 41.19% gc time)
      0.912572 seconds (30 allocations: 1.667 GiB, 5.32% gc time)
      0.480374 seconds (16 allocations: 435.807 MiB, 41.15% gc time)