Dataframe Julia中数据帧的多级索引？_Dataframe_Julia_Multi Index

Dataframe Julia中数据帧的多级索引？

dataframe julia

Dataframe Julia中数据帧的多级索引？,dataframe,julia,multi-index,Dataframe,Julia,Multi Index,我可以知道如何对Julia中的数据帧应用多级索引吗？或者是否有任何其他方法、途径或一揽子方案来实现这一目标更新 python代码示例：将numpy导入为np 作为pd进口熊猫数组=[np.数组（[“bar”，“bar”，“baz”，“baz”，“foo”，“foo”，“qux”，“qux”]），数组（[“一”，“二”，“一”，“二”，“一”，“二”，“二”，“一”，“二”，“一”，“二]），] df=pd.DataFrame（np.random.randn（8,4），索引=数组） df

我可以知道如何对Julia中的数据帧应用多级索引吗？或者是否有任何其他方法、途径或一揽子方案来实现这一目标

更新 python代码示例：

将numpy导入为np
作为pd进口熊猫
数组=[np.数组（[“bar”，“bar”，“baz”，“baz”，“foo”，“foo”，“qux”，“qux”]），
数组（[“一”，“二”，“一”，“二”，“一”，“二”，“二”，“一”，“二”，“一”，“二]），]
df=pd.DataFrame（np.random.randn（8,4），索引=数组）
df

输出：->

谢谢

你的意思是这样的吗

julia> # Initialise data structure
       a = [
         [1,2],
         [3,4,5]
       ]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4, 5]

julia> # Do multilevel indexing

julia> a[1][1]
1

julia> a[2][3]
5

你是说像这样的事吗

julia> # Initialise data structure
       a = [
         [1,2],
         [3,4,5]
       ]
2-element Vector{Vector{Int64}}:
 [1, 2]
 [3, 4, 5]

julia> # Do multilevel indexing

julia> a[1][1]
1

julia> a[2][3]
5

我理解你的问题，但关键是你需要使用索引做什么

以下是

groupby

的工作原理：

julia> using DataFrames

julia> df = DataFrame(x=repeat(["bar", "baz"], inner=3), y=repeat(["one", "two"], outer=3), z=1:6)
6×3 DataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
   4 │ baz     two         4
   5 │ baz     one         5
   6 │ baz     two         6

julia> groupby(df, :x) # 1-level index
GroupedDataFrame with 2 groups based on key: x
First Group (3 rows): x = "bar"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
⋮
Last Group (3 rows): x = "baz"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     one         5
   3 │ baz     two         6

julia> groupby(df, :y) # 1-level index
GroupedDataFrame with 2 groups based on key: y
First Group (3 rows): y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
   3 │ baz     one         5
⋮
Last Group (3 rows): y = "two"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2
   2 │ baz     two         4
   3 │ baz     two         6

julia> groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

下面是两级索引的索引示例：

julia> gdf = groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

julia> gdf[("bar", "two")]
1×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2

julia> gdf[("baz", "two")]
2×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     two         6

现在DataFrames.jl和Pandas在索引方面有所不同。对于您拥有的熊猫（有关基准，请参阅）：

索引唯一时，使用哈希表将键映射到值O（1）。当索引非唯一且已排序时，panda使用二进制搜索O（logN），当索引为随机排序时，panda需要检查索引O（N）中的所有键

而对于DataFrames.jl，无论您用于索引查找的源列是什么，都始终是O（1）。

我理解您的问题，但关键是您需要使用索引做什么

以下是

groupby

的工作原理：

julia> using DataFrames

julia> df = DataFrame(x=repeat(["bar", "baz"], inner=3), y=repeat(["one", "two"], outer=3), z=1:6)
6×3 DataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
   4 │ baz     two         4
   5 │ baz     one         5
   6 │ baz     two         6

julia> groupby(df, :x) # 1-level index
GroupedDataFrame with 2 groups based on key: x
First Group (3 rows): x = "bar"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     two         2
   3 │ bar     one         3
⋮
Last Group (3 rows): x = "baz"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     one         5
   3 │ baz     two         6

julia> groupby(df, :y) # 1-level index
GroupedDataFrame with 2 groups based on key: y
First Group (3 rows): y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
   3 │ baz     one         5
⋮
Last Group (3 rows): y = "two"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2
   2 │ baz     two         4
   3 │ baz     two         6

julia> groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

下面是两级索引的索引示例：

julia> gdf = groupby(df, [:x, :y]) # 2-level index
GroupedDataFrame with 4 groups based on keys: x, y
First Group (2 rows): x = "bar", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     one         1
   2 │ bar     one         3
⋮
Last Group (1 row): x = "baz", y = "one"
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     one         5

julia> gdf[("bar", "two")]
1×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ bar     two         2

julia> gdf[("baz", "two")]
2×3 SubDataFrame
 Row │ x       y       z
     │ String  String  Int64
─────┼───────────────────────
   1 │ baz     two         4
   2 │ baz     two         6

现在DataFrames.jl和Pandas在索引方面有所不同。对于您拥有的熊猫（有关基准，请参阅）：

而对于DataFrames.jl，无论您用于索引查找的源列是什么，都始终是O（1）。

您能否给出代码示例，例如在pandas中（我假设这是您的问题来源），您希望实现什么？然后我可以建议您如何在DataFrames.jl中执行此操作。通常使用

groupby

在数据框中添加任意数量的列的索引。@BogumilKaminski感谢您的回答，我已经用代码更新了问题。我也会努力研究你的建议。你能给出代码示例吗？例如，在《熊猫》（因为我假设这是你的问题来源）中，你想实现什么？然后我可以建议您如何在DataFrames.jl中执行此操作。通常使用

groupby

在数据框中添加任意数量的列的索引。@BogumilKaminski感谢您的回答，我已经用代码更新了问题。我也会尝试你的建议。谢谢你的建议，这是一个了不起的方法。但是我想知道我是否可以将它与熊猫.Multi_index（）相比较？也许我会尝试将我的数据帧转换为矩阵，然后使用这种方法谢谢你的建议，这是一种神奇的方法。但是我想知道我是否可以将它与pandas.Multi_index（）进行比较？也许我会尝试将数据帧转换为矩阵，然后使用这种方法