将具有相同结构的JSON文件合并到包含列表的JSON文件中_Json_List_Merge_Jq

将具有相同结构的JSON文件合并到包含列表的JSON文件中

json list merge

将具有相同结构的JSON文件合并到包含列表的JSON文件中,json,list,merge,jq,Json,List,Merge,Jq,我有一些JSON文件，所有文件都具有相同的结构（到处都是相同的键，某些键的对应值可能不同）。我希望将与某些键关联的值收集到列表中，并将这些列表作为与这些键关联的值存储在新的JSON文件中例如，考虑这三个文件，在这里我感兴趣的是密钥 NoMyByTys和相应的值。第一档- [ { "box_id": 1, "number_items": 4 }, { "box_id": 3, "number_items": 15 }, { "box_i

我有一些JSON文件，所有文件都具有相同的结构（到处都是相同的键，某些键的对应值可能不同）。我希望将与某些键关联的值收集到列表中，并将这些列表作为与这些键关联的值存储在新的JSON文件中

例如，考虑这三个文件，在这里我感兴趣的是密钥<代码> NoMyByTys<代码>和相应的值。第一档-

[
  {
    "box_id": 1,
    "number_items": 4
  },
  {
    "box_id": 3,
    "number_items": 15
  },
  {
    "box_id": 6,
    "number_items": 2
  }
]

第二档-

[
  {
    "box_id": 1,
    "number_items": 7
  },
  {
    "box_id": 3,
    "number_items": 15
  },
  {
    "box_id": 6,
    "number_items": 4
  }
]

[
  {
    "box_id": 1,
    "number_items": 5
  },
  {
    "box_id": 3,
    "number_items": 9
  },
  {
    "box_id": 6,
    "number_items": 0
  }
]

第三档-

[
  {
    "box_id": 1,
    "number_items": 7
  },
  {
    "box_id": 3,
    "number_items": 15
  },
  {
    "box_id": 6,
    "number_items": 4
  }
]

[
  {
    "box_id": 1,
    "number_items": 5
  },
  {
    "box_id": 3,
    "number_items": 9
  },
  {
    "box_id": 6,
    "number_items": 0
  }
]

这些应该合并成这样的东西-

[
  {
    "box_id": 1,
    "number_items": [
      4,
      7,
      5
    ]
  },
  {
    "box_id": 3,
    "number_items": [
      15,
      15,
      9
    ]
  },
  {
    "box_id": 6,
    "number_items": [
      2,
      4,
      0
    ]
  }
]

这可以使用

jq

完成吗？如果不是，那么什么是好的方法？请注意，实际场景包含150多个文件，其中包含3个键，我希望将这些键的值合并到列表中。

根据您尝试保存此新文件的位置（本地vs服务器），有几种不同的方法。据我所知，如果不使用一个可用的插件（），就无法在本地保存文件。如果您想将其保存到服务器，使用JavaScript是不可能的，最好使用后台语言

下面是一种将多个JSON文件的内容组合成所需格式的方法

// send json files you want combined, and a new file path and name (path/to/filename.json)
  function combineJsonFiles(files, newFileName) {
    var combinedJson = [];
    // iterate through each file 
    $.each(files, function(key, fileName) {
      // load json file
      // wait to combine until loaded. without this 'when().done()', boxes would return 'undefined'
      $.when(loadJsonFile(fileName)).done(function(boxes) {
        // combine json from file with combinedJson array
        combinedJson = combineJson(boxes, combinedJson);
        // check if this is the last file
        if (key == files.length-1) {
          // puts into json format
          combinedJson = JSON.stringify(combinedJson);
          // your json is now ready to be saved to a file
        }
      });
    });
  }

  function loadJsonFile(fileName) {
    return $.getJSON(fileName);
  }



function combineJson(boxes, combinedJson) {
  // iterate through each box 
  $.each(boxes, function(key, box) {
    // use grep to search if this box's id is already included
    var matches = $.grep(combinedJson, function(e) { return e.box_id == box.box_id; });

    // if there are no matches, add box to the combined file
    if (matches.length == 0) {

      var newBox = { box_id: box.box_id };

      // iterate through properties of box
      for (var property in box) {
        // check to ensure that properties are not inherited from base class
        if (box.hasOwnProperty(property)) {
          // will ignore if property is box_id
          if (property !== 'box_id') {
            // box is reformatted to make the property type into array
            newBox[property] = [box[property]];
          }
        }
      }
      combinedJson.push(newBox);
    } else {
      // select first match (there should never be more than one)
      var match = matches[0];

      // iterate through properties of box
      for (var property in box) {
        // check to ensure that properties are not inherited from base class
        if (box.hasOwnProperty(property)) {
          // will ignore if property is box_id
          if (property !== 'box_id') {
            // add property to the already existing box in the combined file
            match[property].push(box[property]);
          }
        }
      }
    }
  });
  return combinedJson;
}

  var jsonFiles = ['path/to/data.json', 'path/to/data2.json', 'path/to/data3.json'];

  combineJsonFiles(jsonFiles, 'combined_json.json');

此文件的JSON输出如下所示：

[{"box_id":1,"number_items":[4,7,5]},{"box_id":3,"number_items":[15,15,9]},{"box_id":6,"number_items":[2,4,0]}]

希望这有帮助

只需将所有文件作为输入传入，就可以合并具有类似结构的文件。它们的内容将按顺序流式传输

然后，您可以将它们读入一个数组，按

框id

对对象进行分组，然后绘制结果

$ jq -n '
    [inputs[]] | group_by(.box_id)
        | map({box_id:.[0].box_id, number_items:map(.number_items)})
' input{1,2,3}.json

产生：

[
  {
    "box_id": 1,
    "number_items": [
      4,
      7,
      5
    ]
  },
  {
    "box_id": 3,
    "number_items": [
      15,
      15,
      9
    ]
  },
  {
    "box_id": 6,
    "number_items": [
      4,
      2,
      0
    ]
  }
]

在某些平台上对项目进行分组时，似乎没有保留顺序。在我的例子中，在Windows 64位版本上运行会产生这种情况。因此，如果您想使用

group\u by

，请注意这一点。当然，如果您想避免使用此过滤器，还可以采取其他方法，但使用起来更方便

我想收集与某些键相关的值

这里有一个解决方案，它以相同的方式处理除分组键之外的所有键。它还可以优雅地处理丢失的密钥，并且不依赖于jq的

排序的稳定性。该解决方案基于通用筛选器，merge/0
，定义如下：
# Combine an array of objects into a single object, ans, with array-valued keys,
# such that for every key, k, in the i-th object of the input array, a,
# ans[k][i] = a[i][k]
# null is used as padding if a value is missing.
# Example:
# [{a:1, b:2}, {b:3, c:4}] | merge
# produces:
# {"a":[1,null],"b":[2,3],"c":[null,4]}
def merge:
  def allkeys: map(keys) | add | unique;
  allkeys as $allkeys
  | reduce .[] as $in ({};
     reduce $allkeys[] as $k (.;
      . + {($k): (.[$k] + [$in[$k]]) } ));

然后，给定问题的解决方案可以表述为：
transpose | map(merge) | map( .box_id |= .[0] )

调用：
  jq -s -f merge.jq input{1,2,3}.json

输出：如问题所示
更稳健的解决方案
上述解决方案假设每个文件中按box\u id
排序的一致性。OP要求似乎证明了这一假设，但为了安全性和稳健性，应首先对对象进行分类：
map(sort_by(.box_id)) | transpose | map( merge | (.box_id |= .[0]) )

注意，这仍然假设在任何输入文件中都没有缺少box\u id
的值
更稳健的解决方案
如果任何输入文件中可能缺少某些box\u id
值，则可以添加缺少的值。这可以通过以下过滤器来完成：
# Input: a matrix of objects (that is, an array of rows of objects),
#   each of which is assumed to have a distinguished field, f,
#   with distinct values on each row;
# Output: a rectangular matrix such that every row, r, of the output
#   matrix includes the elements of the corresponding row of the input
#   matrix, with additional elements as necessary so that (r |
#   map(.id) | sort) is the same for all rows r.
#
def rectanglize(f):
  def ids: [.[][] | f] | unique;
  def it: . as $in | {} | (f = $in);
  ids as $ids
  | map( . + ( $ids - [.[]|f] | map(it) ) )
;  

把所有东西放在一起，主管道变成：
rectanglize(.id)
| map(sort_by(.box_id))
| transpose 
| map( merge | .box_id |= .[0] )

杰夫-当我运行你的程序时，我得到了最后一个数字\u项数组的[2,4,0]，正如预期的那样。你真的得到[4,2,0]了吗？嗯，我按原样复制了数据和收到的输出。我也希望[2,4,0]
。看起来它是从分组中重新排序的，而不是我所期望的那样。快速跟进问题，如何显示每个文件的项目数总和？使用jq-s'map（.[].number_items）| add'input{1,2,3}.json
返回61
，即所有文件中所有项的总和。当然，我可以为每个文件单独手动运行命令，然后收集所有内容，但我想知道jq
是否可以为我这样做？（在Linux上，我会在*.json中为k使用；dojq-s'map（.[].number_items）|添加“$k；done
来完成此操作）@Ailurus：对于每个文件的总和，不要一开始就把它们结合起来。您甚至可以获取当前正在处理的文件的名称。我会这样做：jq'{file:input_filename，sum:map（.number_items）|add}'*.json
。这是一种非常好的方法，它通常是适用的，而且非常稳定。谢谢尽管如此，Jeff的答案还是比较容易理解，运行起来也很简单，并且可以满足我的简单数据文件的需要。因此，我将把他的答案标记为被接受的答案。