Json 在同一数组上获取所有结果

Json 在同一数组上获取所有结果,json,group-by,jq,Json,Group By,Jq,我已经挣扎了几个小时,我很确定我缺少了一些东西 考虑到这一点: [ { "LAST_JOB_POD":"gitlab-web-65-gwwwh", "STARTED_AT":"31-05-2018-18:18:48", "FINISHED":"false", "FIRST_INDEXED":"0", "LAST_INDEXED":"3143", "failed_projects":{ "1082": "4:Deadline Exceeded, trace",

我已经挣扎了几个小时,我很确定我缺少了一些东西

考虑到这一点:

[
{
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "1082": "4:Deadline Exceeded, trace",
    "1273": "/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/elasticsearch-transport-5.0.3/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error'",
    "2492": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "3060": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
},
{
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "5570": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6103": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6188": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6695": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6721": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6728": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6747": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
},
{ 
  "LAST_JOB_POD":"gitlab-web-65-gwwwh",
  "STARTED_AT":"31-05-2018-18:18:48",
  "FINISHED":"false",
  "FIRST_INDEXED":"0",
  "LAST_INDEXED":"3143",
  "failed_projects":{
    "6760": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6939": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6941": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6942": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "6947": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)",
    "7201": "/opt/gitlab/embedded/lib/ruby/gems/2.3.0/gems/elasticsearch-transport-5.0.3/lib/elasticsearch/transport/transport/base.rb:201:in `__raise_transport_error'",
    "7707": ", trace - [\"/opt/gitlab/embedded/service/gitlab-rails/ee/lib/gitlab/elastic/indexer.rb:64:in `run_indexer!'\"",
    "7787": "/opt/gitlab/embedded/lib/ruby/2.3.0/net/protocol.rb:176:in `rbuf_fill': Net::ReadTimeout (Faraday::TimeoutError)"
  }
}
]
我目前正在使用
jq
提取
失败的\u项目
条目,但是

[]|选择(.failed_projects!=null)|。作为$object{“failed_projects”}[]

我在不同的小组中得到了结果:

{
"1082": "...",
...
}
{
"5570": "...",
...
}
{
"6760": "...",
...
}
我试图完成的是将ID与相同的异常分组。例如:

[{
"Exception": "ReadTimeout",
 [{
   "ID": 2492,
   "ID": 3060
 }]
},
{
"Exception": "Deadline Exceeded",
 [{
   "ID": 1082
 }]
}]

说明性输出作为JSON无效,并且具有重复键的对象,这可能不是您实际想要的,但是下面的jq程序将生成符合一般问题描述的输出。由于您似乎没有指定精确的分组标准,因此我将最后一个“:”之后的错误消息文本作为分组标准。(例如,如果您想在第一个“:”之后考虑文本,使用“^ [^:] *:*”作为正则表达式)

第一步将
.failed_项目
收集在一起,并将
应用于_条目
,以便我们可以轻松访问ID和错误消息文本:

[.[] | .failed_projects | to_entries[]]
接下来,我们提取分组标准,并使用它来形成组:

| map(.value |= sub("^.*: *";""))
| group_by(.value)
最后,我们将这些组转换为以下形式的JSON对象: {GROUP:ARRAY_OF_id}

| map( .[0].value as $key
       | [.[] | .key] as $value
       | {($key): $value} )
将上述片段放在文件program.jq中,并使用调用:

jq -f program.jq input.json
产生如下所示的输出。显然,您需要修改分组条件。您可能还希望将ID字符串转换为JSON数字,这可以通过
tonumber
或更谨慎地完成 通过
(tonumber?/)

为了理解program.jq,您可能希望从第一个片段开始,然后依次添加其他片段

输出
[].failed_projects | select(.!=null)
是您现在拥有的更简单的版本。不清楚您想要什么作为最终输出。是否要将所有对象合并为一个对象?@chepner我用所需的输出更新了问题,并按值对巧合进行分组。任何分割值与ID一致的都可以。示例性输出作为JSON无效,并且包含具有重复键的对象。在没有这些对象的情况下生成有效的JSON作为输出是否可以接受?如果是的话,您能提供一个所需JSON输出的示例吗?
[
  {
    "Deadline Exceeded, trace": [
      "1082"
    ]
  },
  {
    "TimeoutError)": [
      "6728",
      "6747",
      "6939",
      "5570",
      "6103",
      "6188",
      "6695",
      "6721",
      "2492",
      "6760",
      "3060",
      "6941",
      "6942",
      "6947",
      "7787"
    ]
  },
  {
    "in `__raise_transport_error'": [
      "1273",
      "7201"
    ]
  },
  {
    "in `run_indexer!'\"": [
      "7707"
    ]
  }
]