Ruby 如何重新组织/筛选哈希值
我有一个哈希数组:Ruby 如何重新组织/筛选哈希值,ruby,Ruby,我有一个哈希数组: data = [{"user_id"=>1, "answer"=>"cupcakes"}, {"user_id"=>1, "answer"=>"Colorado"}, {"user_id"=>1, "answer"=>"newspaper"}, {"user_id"=>2, "answer"=>"fruitcake"}, {"user_id"=>2, "answer"=>"Louisiana"}, {"us
data = [{"user_id"=>1, "answer"=>"cupcakes"},
{"user_id"=>1, "answer"=>"Colorado"},
{"user_id"=>1, "answer"=>"newspaper"},
{"user_id"=>2, "answer"=>"fruitcake"},
{"user_id"=>2, "answer"=>"Louisiana"},
{"user_id"=>2, "answer"=>"tv"}]
我如何重新组织它,以便它按“user\u id”
分组,并在一个散列中列出所有“answer”
?比如:
output_data = [{"user_id" => 1, "answer1"=>"cupcakes", "answer2"=>"Colorado", "answer3"=>"newspaper"},
{"user_id" => 2, "answer1"=>"fruitcake", "answer2"=>"Louisiana", "answer3"=>"tv"}]
或者将所有答案放在一个数组中:
output_data = [{"user_id" => 1, "answers"=>["cupcakes", "Colorado", "newspaper"]},
{"user_id" => 2, "answers"=>["fruitcake", "Louisiana", "tv"]}]
我不受这个特定输出的限制。我确实需要将
“user\u id”
作为一个键,并将所有答案组织在一起。有什么建议吗?您可以按如下方式执行:
g, h = enum.next
#=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}]
g #=> {"user_id"=>1, "answer"=>"cupcakes"}
h #=> {}
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
代码
def convert(arr)
arr.each_with_object({}) do |g,h|
h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
示例
def convert(arr)
arr.each_with_object({}) do |g,h|
h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
解释
我们有:
enum = data.each_with_object(Hash.new { |h,k| h[k] = [] })
#=> #<Enumerator: [{"user_id"=>1, "answer"=>"cupcakes"},
# {"user_id"=>1, "answer"=>"Colorado"},
# {"user_id"=>1, "answer"=>"newspaper"},
# {"user_id"=>2, "answer"=>"fruitcake"},
# {"user_id"=>2, "answer"=>"Louisiana"},
# {"user_id"=>2, "answer"=>"tv"}]:
# each_with_object({})>
如您所见,枚举数包含六个元素,每个元素都是一个两元素数组,由data
元素和一个最初为空的哈希组成
关键的是,我正在使用(akamerge!
)的形式,当一个键出现在两个合并的散列中时,它使用一个块来确定键的值
enum
的第一个元素被传递到块并分配给块变量,如下所示:
g, h = enum.next
#=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}]
g #=> {"user_id"=>1, "answer"=>"cupcakes"}
h #=> {}
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
因此,区块计算为:
h.update(g["user_id"]=>[g["answer"]])
# {}.update(1=>["cupcakes"])
#=> {1=>["cupcakes"]}
h #=> {1=>["cupcakes"]}
update
的块不用于此第一次合并操作,因为(合并之前)h
没有键1
。在后面的操作中,再次执行g[“用户id”]#=>1
。此时,该块将用于确定键1
的值
这导致:
h = data.each_with_object({}) do |g,h|
h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end
#=> { 1=>["cupcakes", "Colorado", "newspaper"],
# 2=>["fruitcake", "Louisiana", "tv"] }
然后,将h
的关键元素对映射到所需的哈希数组是一件简单的事情
备选方案
def convert(arr)
arr.each_with_object({}) do |g,h|
h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
通过合并哈希实现此目的的另一种方法如下:
g, h = enum.next
#=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}]
g #=> {"user_id"=>1, "answer"=>"cupcakes"}
h #=> {}
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
#=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
# {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
你的预期结果毫无意义。要维护
“应答”
信息,您需要将它们作为一个数组保存
data.group_by{|h| h["user_id"]}.each{|_, v| v.map!{|h| h["answer"]}}
# =>
# {
# 1=>["cupcakes", "Colorado", "newspaper"],
# 2=>["fruitcake", "Louisiana", "tv"]
# }
像
“user\u id”
和“answer”
这样的字符串是多余的,你应该避免它们出现在数据中,除非它能以任何方式帮助你把它们弄清楚。上面的输出数据
是不正确的(散列不能包含相同的键)。你要找的是。@mudasobwa是的,你是对的。我问这个问题时已经很晚了。我稍微澄清了这个问题。s/answer/answers/
在生成的数据集中:)很好的干净解决方案。不过,我不同意字符串是冗余的,因为它们为数据结构的使用者提供了有价值的上下文。可以使用符号来提高效率。@fylooi,你可以通过冻结字符串来吃蛋糕。@CarySwoveland:虽然我相信惯例是使用符号作为散列键。@fylooi在这种情况下,我显示的结果可以命名为user\u id\u to\u answer
,就是这样。每个条目都不需要这些标签。analytics当然可以在最后一个右括号后加上.map{k,v{“user_id”=>k,“answer”=>v}
。