如何使用jq合并结构相同的嵌套json文件
我需要将一个数组合并到一系列结构相同的嵌套JSON文件中,这些文件共享相同的高级键 目标是创建一个合并文件,同时保留所有现有的更高级别的键和值 文件1:如何使用jq合并结构相同的嵌套json文件,json,merge,jq,Json,Merge,Jq,我需要将一个数组合并到一系列结构相同的嵌套JSON文件中,这些文件共享相同的高级键 目标是创建一个合并文件,同时保留所有现有的更高级别的键和值 文件1: { "account": "123456789012", "regions": [ { "region": "one", "services": [ { "groups": [ { "GroupId": "123456
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
}
]
}
]
}
文件2:
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
预期结果:
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
根据对其他此类问题的回答,我尝试了以下方法,但没有成功:
jq -s '.[0] * .[1]' test1.json test2.json
jq -s add test1.json test2.json
jq -n '[inputs[]]' test{1,2}.json
以下内容成功合并了数组,但结果中缺少更高级别的键和值
jq -s '.[0].regions[0].services[0] * .[1].regions[0].services[0]' test1.json test2.json
我假设有一个简单的jq解决方案可以逃避我的搜索。如果不是,那么jq和bash的任何组合都可以解决问题 这是一个解决方案,它将数组转换为服务级别的对象,与
*
合并并转换回数组形式。如果file1
和file2
包含示例数据,则此命令:
$ jq -Mn --argfile file1 file1 --argfile file2 file2 '
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
# | debug # uncomment debug here to see stream
;
reduce merge as $x ({}; . * $x) # use '*' to recombine all the objects from merge
# | debug # uncomment debug here to see combined object
| keys[] as $a # for each key (account) of combined object
| {account:$a, regions:[ # construct object with {account, regions array}
.[$a] # for each account
| keys[] as $r # for each key (region) of account object
| {region:$r, services:[ # constuct object with {region, services array}
.[$r] # for each region
| keys[] as $s # for each service
| {($s): .[$s]} # generate service object
]} # add service objects to service array
]}' # add region object ot regions array
产生
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
},
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
}
]
}
]
}
]
}
扩展解释 一步一步地组装起来可以更好地了解其工作原理。 从这个过滤器开始
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| $a
;
merge
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
reduce merge as $x ({}; . * $x)
因为有两个对象(一个来自file1,一个来自file2),所以此输出
每个帐户的.account
:
"123456789012"
"123456789012"
请注意,.account as$a
不会更改
的当前值。
变量允许我们“深入”到子对象,而不会丢失更高的数据
级别上下文。考虑这个过滤器:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| [$a, $r]
;
merge
哪些输出(帐户、区域)对:
现在,我们可以继续深入到服务领域:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[]
| [$a, $r, .]
;
merge
该点处数组的第三个元素(
)引用每个元素
.services
数组中的连续服务,因此此筛选器生成
["123456789012","one",{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}]}]
["123456789012","one",{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}]
此(完整)合并功能:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
;
merge
产流
{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}]}}}
{"123456789012":{"one":{"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}}}
需要注意的重要一点是,这些对象可以很容易地与*
通过减少步骤:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| {($a): {($r): $s}} # generate object for each account,region,service
;
reduce merge as $x ({}; . * $x) # use '*' to recombine all the objects from merge
reduce将其本地状态(
)初始化为{}
,然后
计算合并函数中每个结果的新状态
通过计算*$x
,递归合并对象合并
从$file1和$file生成:
{"123456789012":{"one":{"groups":[{"GroupId":"123456","GroupName":"foo"},
{"GroupId":"234567","GroupName":"bar"}],
"group_policies":[{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
{"GroupName":"bar","PolicyNames":["all_bar"]}]}}}
请注意,*
在“组”和“组策略”键中的数组对象处停止合并。
如果我们想继续合并,我们可以在合并函数中创建更多的对象。例如
考虑这个扩展:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
merge
这次合并比上一次更深入,产生
{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"}}}}}
{"123456789012":{"one":{"groups":{"234567":{"GroupId":"234567","GroupName":"bar"}}}}}
{"123456789012":{"one":{"group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]}}}}}
{"123456789012":{"one":{"group_policies":{"bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}
这里重要的是“groups”和“groupu策略”键包含对象
这意味着在这个过滤器中
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| $a
;
merge
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
reduce merge as $x ({}; . * $x)
reduce*
将合并组和组策略,而不是覆盖它们,从而生成:
{"123456789012":{"one":{"groups":{"123456":{"GroupId":"123456","GroupName":"foo"},
"234567":{"GroupId":"234567","GroupName":"bar"}},
"group_policies":{"foo":{"GroupName":"foo","PolicyNames":["all_foo","all_bar"]},
"bar":{"GroupName":"bar","PolicyNames":["all_bar"]}}}}}
将其恢复为原始形式需要更多的工作,但不需要太多:
def merge: # merge function
($file1, $file2) # process $file1 then $file2
| .account as $a # save .account in $a
| .regions[] # for each element of .regions
| .region as $r # save .region in $r
| .services[] as $s # save each element of .services in $s
| (
$s.groups[]? as $g
| {($a): {($r): {groups: {($g.GroupId): $g}}}}
), (
$s.group_policies[]? as $p
| {($a): {($r): {group_policies: {($p.GroupName): $p}}}}
)
;
reduce merge as $x ({}; . * $x)
| keys[] as $a # for each key (account) of combined object
| {account:$a, regions:[ # construct object with {account, regions array}
.[$a] # for each account
| keys[] as $r # for each key (region) of account object
| {region:$r, services:[ # constuct object with {region, services array}
.[$r] # for each region
| {groups: [.groups[]]} # add groups to service
, {group_policies: [.group_policies[]]} # add group_policies to service
]}
]}
现在在这个版本中,假设我们的文件2包含一个组以及组策略。e、 g
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "999",
"GroupName": "baz"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
此解决方案的第一个版本是在哪里生产的
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
},
{
"groups": [
{
"GroupId": "999",
"GroupName": "baz"
}
]
}
]
}
]
}
本修订版产生
{
"account": "123456789012",
"regions": [
{
"region": "one",
"services": [
{
"groups": [
{
"GroupId": "123456",
"GroupName": "foo"
},
{
"GroupId": "234567",
"GroupName": "bar"
},
{
"GroupId": "999",
"GroupName": "baz"
}
]
},
{
"group_policies": [
{
"GroupName": "foo",
"PolicyNames": [
"all_foo",
"all_bar"
]
},
{
"GroupName": "bar",
"PolicyNames": [
"all_bar"
]
}
]
}
]
}
]
}
结合
jq add
和jq为我们提供:
jq'.hits.hits'日志。*.json | jq-s add
这将把所有日志中的所有hits.hits数组。*.json文件合并成一个大数组。感谢您的快速回复和解决方案。这既有效,也为我提供了一些可以研究的东西,以理解它为什么会这样做。8^)我有点惊讶jq没有提供一个更直接的方法来实现这个目标。不客气。递归
*
功能强大,但正如您所发现的,它在非对象值上停止递归。我建议的一件事是更一般地考虑如何在服务级别合并阵列数据。在您提供的示例数据中,没有冲突,但如果file2的服务还包含“组”,则此代码将用file2中的组替换file1中的组,这可能不是您想要的,但一旦您知道如何合并两个服务阵列,这很容易纠正。RE:如果file2的服务还包含“组”,则此代码将将文件1中的组替换为文件2中的组谢谢您指出这一点。对于这些特定的AWS数据集来说,这不应该是一个问题,但在其他用途中可能会出现。请您提供一个逐步细分的解决方案?这将极大地帮助我们这些即将进入jq学习曲线的人理解这个解决方案,并使它适应其他数据集和情况。还可以尝试取消注释某些命令,以便更好地了解这些点的数据