Ruby 如何重新组织/筛选哈希值

Ruby 如何重新组织/筛选哈希值,ruby,Ruby,我有一个哈希数组: data = [{"user_id"=>1, "answer"=>"cupcakes"}, {"user_id"=>1, "answer"=>"Colorado"}, {"user_id"=>1, "answer"=>"newspaper"}, {"user_id"=>2, "answer"=>"fruitcake"}, {"user_id"=>2, "answer"=>"Louisiana"}, {"us

我有一个哈希数组:

data = [{"user_id"=>1, "answer"=>"cupcakes"},
 {"user_id"=>1, "answer"=>"Colorado"},
 {"user_id"=>1, "answer"=>"newspaper"},
 {"user_id"=>2, "answer"=>"fruitcake"},
 {"user_id"=>2, "answer"=>"Louisiana"},
 {"user_id"=>2, "answer"=>"tv"}]
我如何重新组织它,以便它按
“user\u id”
分组,并在一个散列中列出所有
“answer”
?比如:

output_data = [{"user_id" => 1, "answer1"=>"cupcakes", "answer2"=>"Colorado", "answer3"=>"newspaper"},
{"user_id" => 2, "answer1"=>"fruitcake", "answer2"=>"Louisiana", "answer3"=>"tv"}]
或者将所有答案放在一个数组中:

output_data = [{"user_id" => 1, "answers"=>["cupcakes", "Colorado", "newspaper"]},
{"user_id" => 2, "answers"=>["fruitcake", "Louisiana", "tv"]}]

我不受这个特定输出的限制。我确实需要将
“user\u id”
作为一个键,并将所有答案组织在一起。有什么建议吗?

您可以按如下方式执行:

g, h = enum.next
  #=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}] 
g #=> {"user_id"=>1, "answer"=>"cupcakes"} 
h #=> {} 
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
  h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
代码

def convert(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
  end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
示例

def convert(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
  end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
解释

我们有:

enum = data.each_with_object(Hash.new { |h,k| h[k] = [] })
  #=> #<Enumerator: [{"user_id"=>1, "answer"=>"cupcakes"},
  #                  {"user_id"=>1, "answer"=>"Colorado"},
  #                  {"user_id"=>1, "answer"=>"newspaper"},
  #                  {"user_id"=>2, "answer"=>"fruitcake"},
  #                  {"user_id"=>2, "answer"=>"Louisiana"},
  #                  {"user_id"=>2, "answer"=>"tv"}]:
  #   each_with_object({})> 
如您所见,枚举数包含六个元素,每个元素都是一个两元素数组,由
data
元素和一个最初为空的哈希组成

关键的是,我正在使用(aka
merge!
)的形式,当一个键出现在两个合并的散列中时,它使用一个块来确定键的值

enum
的第一个元素被传递到块并分配给块变量,如下所示:

g, h = enum.next
  #=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}] 
g #=> {"user_id"=>1, "answer"=>"cupcakes"} 
h #=> {} 
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
  h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
因此,区块计算为:

h.update(g["user_id"]=>[g["answer"]])
  # {}.update(1=>["cupcakes"])
  #=> {1=>["cupcakes"]}
h #=> {1=>["cupcakes"]}
update
的块不用于此第一次合并操作,因为(合并之前)
h
没有键
1
。在后面的操作中,再次执行
g[“用户id”]#=>1
。此时,该块将用于确定键
1
的值

这导致:

h = data.each_with_object({}) do |g,h|
  h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
end
  #=> { 1=>["cupcakes", "Colorado", "newspaper"],
  #     2=>["fruitcake", "Louisiana", "tv"] } 
然后,将
h
的关键元素对映射到所需的哈希数组是一件简单的事情

备选方案

def convert(arr)
  arr.each_with_object({}) do |g,h|
    h.update(g["user_id"]=>[g["answer"]]) { |_,o,n| o+n }
  end.map { |k,v| { "user_id"=>k, "answer"=>v } }
end
convert(data)
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]
通过合并哈希实现此目的的另一种方法如下:

g, h = enum.next
  #=> [{"user_id"=>1, "answer"=>"cupcakes"}, {}] 
g #=> {"user_id"=>1, "answer"=>"cupcakes"} 
h #=> {} 
data.each_with_object(Hash.new { |h,k| h[k]=[] }) do |g,h|
  h[g["user_id"]] << g["answer"]
end.map { |k,v| { "user_id"=>k, "answer"=>v } }
  #=> [{"user_id"=>1, "answer"=>["cupcakes", "Colorado", "newspaper"]},
  #    {"user_id"=>2, "answer"=>["fruitcake", "Louisiana", "tv"]}]

你的预期结果毫无意义。要维护
“应答”
信息,您需要将它们作为一个数组保存

data.group_by{|h| h["user_id"]}.each{|_, v| v.map!{|h| h["answer"]}}
# =>
# {
#   1=>["cupcakes", "Colorado", "newspaper"],
#   2=>["fruitcake", "Louisiana", "tv"]
# }

“user\u id”
“answer”
这样的字符串是多余的,你应该避免它们出现在数据中,除非它能以任何方式帮助你把它们弄清楚。

上面的
输出数据
是不正确的(散列不能包含相同的键)。你要找的是。@mudasobwa是的,你是对的。我问这个问题时已经很晚了。我稍微澄清了这个问题。
s/answer/answers/
在生成的数据集中:)很好的干净解决方案。不过,我不同意字符串是冗余的,因为它们为数据结构的使用者提供了有价值的上下文。可以使用符号来提高效率。@fylooi,你可以通过冻结字符串来吃蛋糕。@CarySwoveland:虽然我相信惯例是使用符号作为散列键。@fylooi在这种情况下,我显示的结果可以命名为
user\u id\u to\u answer
,就是这样。每个条目都不需要这些标签。analytics当然可以在最后一个右括号后加上
.map{k,v{“user_id”=>k,“answer”=>v}