Json JQ：拉丁风格的展平函数_Json_Apache Pig_Flatten_Jq

Json JQ：拉丁风格的展平函数

json apache-pig

Json JQ：拉丁风格的展平函数,json,apache-pig,flatten,jq,Json,Apache Pig,Flatten,Jq,我一直在寻找一个在jq中可能不存在的非常具体的功能。如果你知道它不在那里，我会很感激一个善意的通知和一些建议来解决这个问题我正在处理一个公共事务。我已设法将数据缩减为以下行格式： [field1,field2,field3,[author1,...,authorN],[author_type1,...,author_typeN]] 我用于获取此格式的bash命令如下所示： find aps-dataset-metadata_subdir_path/ -name '*.json' | \ xa

我一直在寻找一个在

jq

中可能不存在的非常具体的功能。如果你知道它不在那里，我会很感激一个善意的通知和一些建议来解决这个问题

我正在处理一个公共事务。我已设法将数据缩减为以下行格式：

[field1,field2,field3,[author1,...,authorN],[author_type1,...,author_typeN]]

我用于获取此格式的bash命令如下所示：

find aps-dataset-metadata_subdir_path/ -name '*.json' | \
xargs cat | \
jq --compact-output \
    'select(.authors != null) | [.identifiers.doi, .date, .journal.id, [.authors[].name], [.authors[].type]]'

[field1,field2,field3,author1,author_type1]
[field1,field2,field3,author2,author_type2]
...
...
[field1,field2,field3,authorN,author_typeN]

请注意，

authorN

和

author\u typeN

位于原始数据中的同一对象中（即具有相同的父对象）

我一直在寻找从这些生产线中生产以下产品的方法：

find aps-dataset-metadata_subdir_path/ -name '*.json' | \
xargs cat | \
jq --compact-output \
    'select(.authors != null) | [.identifiers.doi, .date, .journal.id, [.authors[].name], [.authors[].type]]'

[field1,field2,field3,author1,author_type1]
[field1,field2,field3,author2,author_type2]
...
...
[field1,field2,field3,authorN,author_typeN]

jq

中的展平功能似乎是在不生成新列表的情况下进行级别展平。如果你们中有人懂拉丁语，我想要的正是接线员

同样，我知道它可能没有在

jq

中实现。在这种情况下，我可能会用

Python

，或者你们在答案中建议的任何其他很棒的方式来对输出进行后期处理

多谢各位

您不需要在单独的表达式中单独查看作者，而需要查看作者一次。您可以将结果放入变量中，稍后再访问它们

select(.authors != null) | .authors[] as $author |
    [ .identifiers.doi, .date, .journal.id, $author.name, $author.type ]

您不需要在单独的表达式中单独查看作者，而需要查看作者一次。您可以将结果放入变量中，稍后再访问它们

select(.authors != null) | .authors[] as $author |
    [ .identifiers.doi, .date, .journal.id, $author.name, $author.type ]

Jeff建议使用一步方法是有道理的，但如果必须将数组

[field1，field2，field3，[author1，…，authorN]，[author_type1，…，author_typeN]]

转换为

[field1，field2，field3，author，author_typeI]

格式的数组流，那么合适的jq过滤器应该是：

.[0:2] + ([.[3], .[4]] | transpose[])

Jeff建议使用一步方法是有道理的，但如果必须将数组

[field1，field2，field3，[author1，…，authorN]，[author_type1，…，author_typeN]]

转换为

[field1，field2，field3，author，author_typeI]

格式的数组流，那么合适的jq过滤器应该是：

.[0:2] + ([.[3], .[4]] | transpose[])