Dataframe 如何在IndexedTable中添加/编辑值?

Dataframe 如何在IndexedTable中添加/编辑值?,dataframe,julia,Dataframe,Julia,如何在IndexedTable中添加或编辑值?从中我了解到IndexedTable对象本身是不可变的,但不是底层数据,所以我“理解”了为什么这样的东西不起作用,但我不知道如何使用新数据获得新的IndexedTable: myTable = ndsparse(( region = ["US","US","US","US","EU","EU","EU","EU"], product = ["apple","apple","bana

如何在IndexedTable中添加或编辑值?从中我了解到IndexedTable对象本身是不可变的,但不是底层数据,所以我“理解”了为什么这样的东西不起作用,但我不知道如何使用新数据获得新的IndexedTable:

myTable = ndsparse((
             region      = ["US","US","US","US","EU","EU","EU","EU"],
             product     = ["apple","apple","banana","banana","apple","apple","banana","banana"],
             year        = [2011,2010,2011,2010,2011,2010,2011,2010]
           ),(
             production  = [3.3,3.2,2.3,2.1,2.7,2.8,1.5,1.3],
             consumption = [4.3,7.4,2.5,9.8,3.2,4.3,6.5,3.0]
          ))
myTable["EU","banana",2011] = (2.5, 7.5) # ERROR: type Tuple has no field region
myTable["EU","banana",2012] = (2.5, 7.5) # ERROR: type Tuple has no field region
myTable["EU","banana",2011] = (production = 2.5, consumption = 7.5) # ERROR: type Tuple has no field region

看起来您想要的功能在JuliaDB的
Base.merge

将a行与b行合并。要保留唯一键,请从b 优先考虑。提供的函数agg将聚合来自 和具有相同密钥的b

基于您的问题的工作示例:

# tested on Julia 1.0.4
julia> using JuliaDB
julia> myTable = ndsparse((
                    region      = ["US","US","US","US","EU","EU","EU","EU"],
                    product     = ["apple","apple","banana","banana","apple","apple","banana","banana"],
                    year        = [2011,2010,2011,2010,2011,2010,2011,2010]
                  ),(
                    production  = [3.3,3.2,2.3,2.1,2.7,2.8,1.5,1.3],
                    consumption = [4.3,7.4,2.5,9.8,3.2,4.3,6.5,3.0]
                 ))
3-d NDSparse with 8 values (2 field named tuples):
region  product   year │ production  consumption
───────────────────────┼────────────────────────
"EU"    "apple"   2010 │ 2.8         4.3
"EU"    "apple"   2011 │ 2.7         3.2
"EU"    "banana"  2010 │ 1.3         3.0
"EU"    "banana"  2011 │ 1.5         6.5 # Note the old value
"US"    "apple"   2010 │ 3.2         7.4
"US"    "apple"   2011 │ 3.3         4.3
"US"    "banana"  2010 │ 2.1         9.8
"US"    "banana"  2011 │ 2.3         2.5

julia> updated_myTable = ndsparse((
                    region      = ["EU"],
                    product     = ["banana"],
                    year        = [2011]
                  ),(
                    production  = [2.5], # new values here
                    consumption = [7.5]
                 ))
3-d NDSparse with 1 values (2 field named tuples):
region  product   year │ production  consumption
───────────────────────┼────────────────────────
"EU"    "banana"  2011 │ 2.5         7.5

julia> newTable = merge(updated_myTable, myTable, agg = (x,y) -> x)
3-d NDSparse with 8 values (2 field named tuples):
region  product   year │ production  consumption
───────────────────────┼────────────────────────
"EU"    "apple"   2010 │ 2.8         4.3
"EU"    "apple"   2011 │ 2.7         3.2
"EU"    "banana"  2010 │ 1.3         3.0
"EU"    "banana"  2011 │ 2.5         7.5 # Note the updated values here!
"US"    "apple"   2010 │ 3.2         7.4
"US"    "apple"   2011 │ 3.3         4.3
"US"    "banana"  2010 │ 2.1         9.8
"US"    "banana"  2011 │ 2.3         2.5
注意
agg
函数如何在给定冲突的情况下优先选择第一个参数中的键

另一种方法是在发现正确的索引后直接编辑数据元素

julia> i = findfirst(isequal((region = "EU", product = "banana", year = 2011)), myTable.index)
4

julia> myTable.data[i]
(production = 1.5, consumption = 6.5)

julia> myTable.data[i] = (production = 2.5, consumption = 7.5)
(production = 2.5, consumption = 7.5)

julia> myTable
3-d NDSparse with 8 values (2 field named tuples):
region  product   year │ production  consumption
───────────────────────┼────────────────────────
"EU"    "apple"   2010 │ 2.8         4.3
"EU"    "apple"   2011 │ 2.7         3.2
"EU"    "banana"  2010 │ 1.3         3.0
"EU"    "banana"  2011 │ 2.5         7.5 # Note the updated values here!
"US"    "apple"   2010 │ 3.2         7.4
"US"    "apple"   2011 │ 3.3         4.3
"US"    "banana"  2010 │ 2.1         9.8
"US"    "banana"  2011 │ 2.3         2.5
希望有帮助

julia> i = findfirst(isequal((region = "EU", product = "banana", year = 2011)), myTable.index)
4

julia> myTable.data[i]
(production = 1.5, consumption = 6.5)

julia> myTable.data[i] = (production = 2.5, consumption = 7.5)
(production = 2.5, consumption = 7.5)

julia> myTable
3-d NDSparse with 8 values (2 field named tuples):
region  product   year │ production  consumption
───────────────────────┼────────────────────────
"EU"    "apple"   2010 │ 2.8         4.3
"EU"    "apple"   2011 │ 2.7         3.2
"EU"    "banana"  2010 │ 1.3         3.0
"EU"    "banana"  2011 │ 2.5         7.5 # Note the updated values here!
"US"    "apple"   2010 │ 3.2         7.4
"US"    "apple"   2011 │ 3.3         4.3
"US"    "banana"  2010 │ 2.1         9.8
"US"    "banana"  2011 │ 2.3         2.5