Javascript 循环遍历2个Json文件,每个文件中有200k条记录

Javascript 循环遍历2个Json文件,每个文件中有200k条记录,javascript,json,dictionary,filter,reduce,Javascript,Json,Dictionary,Filter,Reduce,我有两个大json文件,每个文件包含200k个对象,当我尝试在两个json之间循环一个公共id时,执行会花费更多的时间 实施1 for (var i in matterData.data) { const fobj = matterData.data[i]; const ma_array = []; for (var j in activityData.data) { const aobj = activityData.data[j]; if (fobj.id =

我有两个大json文件,每个文件包含200k个对象,当我尝试在两个json之间循环一个公共id时,执行会花费更多的时间

实施1

for (var i in matterData.data) {
  const fobj = matterData.data[i];

  const  ma_array = [];
  for (var j in activityData.data) {
    const aobj = activityData.data[j];
    if (fobj.id === aobj.matter.id) {
      ma_array.push(aobj);
    }
    if (ma_array.length > 0) fobj.activities = ma_array;
  }
}
实施2

for (var i in matterData.data) {
  //Activities
  matters_array = [];
  matters_array = activityData.data.filter(function (el) {
    if (el.matter !== null) return el.matter.id == matterData.data[i].id;
  });
  if (matters_array.length > 0) matterData.data[i]["activities"] = matters_array;
}
实施3

for (var i in matterData.data) {
  matters_array = [];

  for (var j in activityData.data) {
    if (activityData.data[j]["matter"] !== null) {
        if (matterData.data[i].id === activityData.data[j]["matter"].id) {
            matters_array.push(activityData.data[j]);
        }
        if (matters_array.length > 0) matterData.data[i]["activities"] = matters_array;
    }
  }
}
每个实现都需要更多的时间来执行

ActivitiesData将有一个id和matterData.id相关的matter.id

任何见解,请帮助

重要数据


var matterData= {
  "data": [
    {
      "id": 1055395769,
      "description": "Peters",
      "status": "Pending",
      "location": null,
      "client_reference": "1532",
      "billable": true,
      "billing_method": "hourly",
      "open_date": "2019-06-05",
      "close_date": null,
}

]
};
var activityData = {
  "data": [
    {
      "id":285568423,
      "type": "ExpenseEntry",
      "date": "2011-01-01",
      "quantity_in_hours": 1,
      "rounded_quantity_in_hours": 1,
      "quantity": 1,
      "rounded_quantity": 1,
      "price": 100,
      "matter": {
        "id": 1055395769
      }
      },
      {
      "id": 285568428,
      "type": "MonEntry", 
      "matter": {
        "id": 1055395769
      }
      },
      {
      "id": 285568442,
      "type": "EEntry", 
      "matter": {
        "id": 1055395769
      }}]
    };


活动数据


var matterData= {
  "data": [
    {
      "id": 1055395769,
      "description": "Peters",
      "status": "Pending",
      "location": null,
      "client_reference": "1532",
      "billable": true,
      "billing_method": "hourly",
      "open_date": "2019-06-05",
      "close_date": null,
}

]
};
var activityData = {
  "data": [
    {
      "id":285568423,
      "type": "ExpenseEntry",
      "date": "2011-01-01",
      "quantity_in_hours": 1,
      "rounded_quantity_in_hours": 1,
      "quantity": 1,
      "rounded_quantity": 1,
      "price": 100,
      "matter": {
        "id": 1055395769
      }
      },
      {
      "id": 285568428,
      "type": "MonEntry", 
      "matter": {
        "id": 1055395769
      }
      },
      {
      "id": 285568442,
      "type": "EEntry", 
      "matter": {
        "id": 1055395769
      }}]
    };


您正在运行一个接近200000*200000的O(n^2)循环。这是一个需要处理的庞大计算。您可以通过使用map来降低这种复杂性。将
activityData.data
的所有值存储在id为的映射中,然后迭代其中一个
matterData.data
以检查映射中是否存在id

实施


首先创建一个映射,将id作为键,matter.data作为值,并为acitvity创建一个空数组,然后迭代活动并在空数组中推送活动

const map = {}
for( var i in matterData.data){
  map[matterData.data[i].id] = matterData.data[i];
  matterData.data[i].activities = [];
}

for(var i in activityData.data){
 var matter = map[activityData.data[i].matter.id];
 matter.activities.push(activityData.data[i]);
}
for( var i in matterData.data){
  console.log(matterData.data[i]);
}
仅当matterData.data[i].id是唯一的时,此解决方案才有效。

Set.has是O(1),我认为可以有效地用于此目的

var subSet = new Set(activityData.data.map(obj => {obj.matter.id}));


for (var i in matterData.data) {

  const  ma_array = [];
  if (subSet.has(matterData.data[i].id){

     ...
  }  

}

您可能需要首先查看按id对JSON进行排序,尽管我不确定id是什么样的。如果ID是一个数字,排序后,使用二进制搜索算法。至于排序算法,这取决于您,因为有很多选项。更新了2个对象的json。上面的代码需要更改