Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用jq获取json流的嵌套键_Json_Schema_Jq - Fatal编程技术网

如何使用jq获取json流的嵌套键

如何使用jq获取json流的嵌套键,json,schema,jq,Json,Schema,Jq,我试图设计一些关系表来保存各种json流的解析输出。数据流具有非常复杂的结构,为了便于表设计,我需要知道每个流的每一层嵌套键。我不知道如何使用jq从流中取出每个嵌套键。下面是一个简化的代表性json流 { "startAt": 0, "total": 5315, "issues": [ { "id": "44269", "name": "someName", "fields": { "fixVersions": [

我试图设计一些关系表来保存各种json流的解析输出。数据流具有非常复杂的结构,为了便于表设计,我需要知道每个流的每一层嵌套键。我不知道如何使用jq从流中取出每个嵌套键。下面是一个简化的代表性json流

{
  "startAt": 0,
  "total": 5315,
  "issues": [
    {
      "id": "44269",
      "name": "someName",
      "fields": {
        "fixVersions": [
          {
            "id": "11401",
            "releaseDate": "2016-09-30"
          }
        ],
        "status": {
          "id": "10110",
          "statusCategory": {
            "id": 3,
            "name": "Done"
          }
        }
      }
    },
    {
      "id": "44270",
      "key": "LEAD-XXXX",
      "fields": {
        "assignee": {
          "id": "10111",
          "name": "Don"
        },
        "status": {
          "id": "10110",
          "statusCategory": {
            "id": 2,
            "name": "inProgress"
          }
        }
      }
    }
  ]
}
我期望得到以下结果。如果有更好的方法帮助我设计桌子,我会非常高兴

startAt
total
issues: []
issues:id
issues:name
issues:key
issues:fields
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
如何使用jq获取上述流的嵌套键。非常感谢你的帮助

我会非常高兴有一个更好的方法

如果我是你,我会从以下内容开始(也许会结束):

[paths(scalars) | map(if type == "number" then 0 else . end)]
| unique
| .[]
 [paths as $p
  | if (getpath($p)|type) == "array" then $p + [" []"]
    elif ($p[-1]|type) == "number" then empty
    else $p
    end
    | map(select(type != "number"))]
 | unique
 | .[]
 | join(":")
在您的示例中,使用-cr命令行选项,将生成:

["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
["startAt"]
["total"]
您可以更接近您所表示的要我将数字0映射到字符串的内容,但是您必须小心该字符串和键名之间的潜在冲突。举例说明:

[paths(scalars) | map(if type == "number" then "[]" else . end)]
| unique
| .[]
| join(":")
产生:

["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
["startAt"]
["total"]
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:id
issues:[]:key
issues:[]:name
startAt
total
issues: []
issues:fields
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:id
issues:key
issues:name
startAt
total
请注意,这种方法产生的结果与基于模式推理的方法基本相同。这是一件好事

使用索引/2 如上所述使用
unique/0
有两个潜在的缺点:(1)输出的顺序不能反映数据中的顺序;(2) 效率(尽管在实践中,这不太可能是一个真正的问题,除非JSON文本具有大量叶路径)

在任何情况下,都可以使用
INDEX/2
代替
unique
。如果您的jq没有
索引/2
,其def在此处给出

简言之:

def INDEX(stream; idx_expr):
  reduce stream as $row ({};
    .[$row|idx_expr|
      if type != "string" then tojson
      else .
      end] |= $row);

INDEX(paths(scalars)
      | map(if type == "number" then "[]" else . end); .)
| .[]
| join(":")
收益率:

startAt
total
issues:[]:id
issues:[]:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:key
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
空数组的路径

如果你想让空数组也被报告,你可以(例如)简单地将“路径(标量)”改为“(路径(标量),路径(数组)””。

< P>如果你想要一个数据的示意性表示,你可能想考虑一个基于模式推断的方法。p> 例如,使用中定义的
schema
函数,您的输入将产生以下推断模式:

{
  "startAt": "number",
  "total": "number",
  "issues": [
    {
      "fields": {
        "assignee": {
          "id": "string",
          "name": "string"
        },
        "fixVersions": [
          {
            "id": "string",
            "releaseDate": "string"
          }
        ],
        "status": {
          "id": "string",
          "statusCategory": {
            "id": "number",
            "name": "string"
          }
        }
      },
      "id": "string",
      "key": "string",
      "name": "string"
    }
  ]
}
如果通过
路径(标量)
对其进行过滤,则会得到:

["startAt"]
["total"]
["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]

除排序外,这些结果与使用更直接方法得到的结果相同;我建议这样做可以验证这两种方法。

路径
绝对是正确的方法,但是获取请求的确切输出有点麻烦。这里有一个过滤器,除了精确的排序外,它可以执行此操作:

def normalize:    # convert paths to requested structure
    if .[-1]|type=="number" then .[-1]="[]" else . end
  | map(select(type!="number"));

def collect:      # collect unique normalized paths into an object
  reduce (paths|normalize) as $p (
     {}
   ; if getpath($p)==null then setpath($p;null) else . end
  );

def colonize($p): # convert object back into : separated paths
    keys_unsorted[] as $k
  | (if $p=="" then $k else "\($p):\($k)" end) as $n
  | $n, (.[$k] | if type=="object" then colonize($n) else empty end);

def summary:      # final output without redundant foo: if foo:[] is present 
    [ collect | colonize("") ]
  | map(select(endswith(":[]"))|.[:-3]) as $remove
  | map(select($remove[[.]]==[]));

summary[]
示例运行(假设filter在
filter.jq
中,数据在
data.json
中)

请注意,空数组在这里有一个问题。如果数据中有空数组,此筛选器将它们作为普通字段报告,因为
路径返回的相应路径不会以数字结尾。最简单的补偿方法是首先将空数组映射到非空数组,如
[{}]
。e、 g

def walk(f):  # defined here in case your jq doesn't have it
    . as $in
  | if type == "object" then reduce keys_unsorted[] as $key (
        {}; . + { ($key):  ($in[$key] | walk(f)) } ) | f
    elif type == "array" then map( walk(f) ) | f
    else f
    end;

  walk(if .==[] then [{}] else . end)
| summary[]
需要明确的是,编写一个jq过滤器以最初设想的格式生成输出是非常容易的,尽管这种格式不太可能通用

以下方法避免了使用
walk/1
来处理空数组的特殊情况。它之所以使用
unique
,仅仅是因为
INDEX/2
未包含在jq 1.5版(*)中

使用示例输入和
-r
命令行选项,可以执行以下操作:

[paths(scalars) | map(if type == "number" then 0 else . end)]
| unique
| .[]
 [paths as $p
  | if (getpath($p)|type) == "array" then $p + [" []"]
    elif ($p[-1]|type) == "number" then empty
    else $p
    end
    | map(select(type != "number"))]
 | unique
 | .[]
 | join(":")
产生:

["issues",0,"fields","assignee","id"]
["issues",0,"fields","assignee","name"]
["issues",0,"fields","fixVersions",0,"id"]
["issues",0,"fields","fixVersions",0,"releaseDate"]
["issues",0,"fields","status","id"]
["issues",0,"fields","status","statusCategory","id"]
["issues",0,"fields","status","statusCategory","name"]
["issues",0,"id"]
["issues",0,"key"]
["issues",0,"name"]
["startAt"]
["total"]
issues:[]:fields:assignee:id
issues:[]:fields:assignee:name
issues:[]:fields:fixVersions:[]:id
issues:[]:fields:fixVersions:[]:releaseDate
issues:[]:fields:status:id
issues:[]:fields:status:statusCategory:id
issues:[]:fields:status:statusCategory:name
issues:[]:id
issues:[]:key
issues:[]:name
startAt
total
issues: []
issues:fields
issues:fields:assignee
issues:fields:assignee:id
issues:fields:assignee:name
issues:fields:fixVersions: []
issues:fields:fixVersions:id
issues:fields:fixVersions:releaseDate
issues:fields:status
issues:fields:status:id
issues:fields:status:statusCategory
issues:fields:status:statusCategory:id
issues:fields:status:statusCategory:name
issues:id
issues:key
issues:name
startAt
total

(*)
unique
可以通过使用
INDEX/2
轻松避免,如本页其他地方所述。

至少可以说,您对jq有着扎实的理解和专业知识。我将学习一些关于jq编程的知识,只看代码本身。空数组是一个很好的捕获方法。就我而言,你的第一个解决方案已经足够好了。非常感谢。由于流中嵌入了数组和/或对象,因此设计RDBMS模式/表结构来保存来自已知源的json输出也有其自身的问题。了解详尽的对象和数组结构有助于定义表结构、规范化和相关设计问题。这一要求引发了一个问题。输出格式是表示输入思维过程的一种方式,不必是广义的。这里提供的解决方案是我深入研究jq的良好起点world@kishore-听起来好像您需要一个模式推理引擎,如本页其他地方提到的schema.jq中定义的。它可用于推断包含多个源的“通用模式”,或推断对应于每个源的多个模式。