Python 如何优化递归函数_Python

Python 如何优化递归函数

python

Python 如何优化递归函数,python,Python,我目前有一个函数，它应该更新每个字典键，其中的值是旧的_id，而新的uuid的值此函数按预期工作，但效率不高，并且使用我的当前数据集执行时间过长必须发生的事情：每个dict[''u id']['$oid']都必须更改为新的uuid 必须将与旧值匹配的每个$oid更改为新uuid 问题是每个字典中都有许多$oid键，我想确保它在更高效的情况下正常工作到目前为止，使用我当前的数据集执行脚本需要45-55秒，它对递归函数进行了4500万次调用recursive\u id\u replace 如

我目前有一个函数，它应该更新每个字典键，其中的值是旧的_id，而新的uuid的值

此函数按预期工作，但效率不高，并且使用我的当前数据集执行时间过长

必须发生的事情：

每个

dict[''u id']['$oid']

都必须更改为新的uuid

必须将与旧值匹配的每个

$oid

更改为新uuid

问题是每个字典中都有许多

$oid

键，我想确保它在更高效的情况下正常工作

到目前为止，使用我当前的数据集执行脚本需要45-55秒，它对递归函数进行了4500万次调用

recursive\u id\u replace

如何优化当前函数以使脚本运行得更快

数据作为字典传递到此函数，该字典如下所示：

file_data = {
    'some_file_name.json': [{...}],
    # ...
}

以下是数据集中一个词典的示例：

[
  {
    "_id": {
      "$oid": "4e0286ed2f1d40f78a037c41"
    },
    "code": "HOME",
    "name": "Home Address",
    "description": "Home Address",
    "entity_id": {
      "$oid": "58f4eb19736c128b640d5844"
    },
    "is_master": true,
    "is_active": true,
    "company_id": {
      "$oid": "591082232801952100d9f0c7"
    },
    "created_at": {
      "$date": "2017-05-08T14:35:19.958Z"
    },
    "created_by": {
      "$oid": "590b7fd32801952100d9f051"
    },
    "updated_at": {
      "$date": "2017-05-08T14:35:19.958Z"
    },
    "updated_by": {
      "$oid": "590b7fd32801952100d9f051"
    }
  },
  {
    "_id": {
      "$oid": "01c593700e704f29be1e3e23"
    },
    "code": "MAIL",
    "name": "Mailing Address",
    "description": "Mailing Address",
    "entity_id": {
      "$oid": "58f4eb1b736c128b640d5845"
    },
    "is_master": true,
    "is_active": true,
    "company_id": {
      "$oid": "591082232801952100d9f0c7"
    },
    "created_at": {
      "$date": "2017-05-08T14:35:19.980Z"
    },
    "created_by": {
      "$oid": "590b7fd32801952100d9f051"
    },
    "updated_at": {
      "$date": "2017-05-08T14:35:19.980Z"
    },
    "updated_by": {
      "$oid": "590b7fd32801952100d9f051"
    }
  }
]

功能如下：

def id_replace(_ids: set, uuids: list, file_data: dict) -> dict:
    """ Replaces all old _id's with new uuid. 
        checks all data keys for old referenced _id and updates it to new uuid value.
    """

    def recursive_id_replace(prev_val, new_val, data: Any):
        """
        Replaces _ids recursively.
        """
        data_type = type(data)

        if data_type == list:
            for item in data:
                recursive_id_replace(prev_val, new_val, item)

        if data_type == dict:
            for key, val in data.items():

                val_type = type(val)

                if key == '$oid' and val == prev_val:
                    data[key] = new_val

                elif val_type == dict:
                    recursive_id_replace(prev_val, new_val, val)

                elif val_type == list:
                    for item in val:
                        recursive_id_replace(prev_val, new_val, item)

    for i, _id in enumerate(_ids):
        recursive_id_replace(_id, uuids[i], file_data)

    return file_data

这里有一种技术涉及使用

json

字符串化数据，然后进行替换

In [415]: old_uid = "590b7fd32801952100d9f051"

In [416]: new_uid = "test123"

In [417]: json.loads(json.dumps(data).replace('"$oid": "%s"' %old_uid, '"$oid": "%s"' %new_uid))
Out[417]: 
[{'_id': {'$oid': '4e0286ed2f1d40f78a037c41'},
  'code': 'HOME',
  'company_id': {'$oid': '591082232801952100d9f0c7'},
  'created_at': {'$date': '2017-05-08T14:35:19.958Z'},
  'created_by': {'$oid': 'test123'},
  'description': 'Home Address',
  'entity_id': {'$oid': '58f4eb19736c128b640d5844'},
  'is_active': True,
  'is_master': True,
  'name': 'Home Address',
  'updated_at': {'$date': '2017-05-08T14:35:19.958Z'},
  'updated_by': {'$oid': 'test123'}},
 {'_id': {'$oid': '01c593700e704f29be1e3e23'},
  'code': 'MAIL',
  'company_id': {'$oid': '591082232801952100d9f0c7'},
  'created_at': {'$date': '2017-05-08T14:35:19.980Z'},
  'created_by': {'$oid': 'test123'},
  'description': 'Mailing Address',
  'entity_id': {'$oid': '58f4eb1b736c128b640d5845'},
  'is_active': True,
  'is_master': True,
  'name': 'Mailing Address',
  'updated_at': {'$date': '2017-05-08T14:35:19.980Z'},
  'updated_by': {'$oid': 'test123'}}]

记住，它只适用于列表和字典。其他数据结构将被隐式转换或抛出错误

列出了更多方法。

这有帮助吗？建议您首先评测代码，以确定在何处尝试优化它。请参阅@martineau我已经利用了这个脚本，这就是我如何知道函数被调用的原因。@cᴏʟᴅsᴘᴇᴇᴅ 我尝试了您在票据中显示的方式，将数据编组到str并使用.replace，执行速度提高了50倍。我得到了45-55秒，现在脚本在1秒内执行。如果您想创建一个答案，我将确保选择它并对其进行向上投票。好的，完成：）