Python 检查代码的效率_Python_Algorithm

Python 检查代码的效率

python algorithm

Python 检查代码的效率,python,algorithm,Python,Algorithm,我有两个文件此表单的树0： 443457316403167232 823615 Tue Mar 11 18:43:57 +0000 2014 2 452918771813203968 26558552 Tue Mar 11 21:10:17 +0000 2014 0 443344824096538625 375391930 Tue Mar 11 11:16:57 +0000 2014 9 452924891285581824 478500516

我有两个文件

此表单的树0：

443457316403167232  823615  Tue Mar 11 18:43:57 +0000 2014  2   
452918771813203968  26558552    Tue Mar 11 21:10:17 +0000 2014  0   
443344824096538625  375391930   Tue Mar 11 11:16:57 +0000 2014  9   
452924891285581824  478500516   Tue Mar 11 11:38:14 +0000 2014  0

trees.json

{"reply": 0, "id": 452918771813203968, "children": [{"reply": 0, "id": 452924891285581824, "children": []}]}

现在，我必须遍历trees.json文件并在tree_0中找到id，如果它存在，那么我必须执行一些任务

我已使用readlines（）加载了树0。两个文件都非常大（10gb大小）。我已经写了一段代码，但想知道这段代码是否正确，或者是否有更有效的方法。对于每一个id，这都会影响到整个树0（while循环）

导入json
导入系统
系统设置递归限制（2000）
fr=open（'tree_0'，'r'）
行=fr.readlines（）
l=长度（线）
#要找到树的孩子，这很好
def get_子节点（节点）：
堆栈=[节点]
堆栈时：
node=stack.pop（）
扩展（节点['children'][：：-1]）
屈服点
f=open（'trees.json'，'r'）
linenum=0
对于f中的行：
d=json.loads（第行）
child_dic={}
如果（linenum我认为您在这里做了很多不必要且低效的工作。首先，由于您只需要ID，因此不必将整个树0
文件存储在内存中。而不是每次迭代所有行并提取ID，而是在加载文件时只执行一次。此外，您可以将ID存储在集中ode>。这将大大提高查找速度
with open('tree_0') as f:
    all_ids = set(int(line.split('\t')[0]) for line in f)

如果您也需要树0
中的另一个字段，您可以将其设置为字典，将ID映射到这些其他字段。这仍然比每次循环列表要快得多
with open('tree_0') as f:
    all_ids = dict((int(items[0]), items) for items in (line.split('\t') for line in f))

通过此更改，代码的其余部分可以归结为：
with open('trees.json') as f: 
    for line in f:
        d = json.loads(line)
        for child in get_children(d):
            if child["id"] in all_ids:
                # optional: get other stuff from dict
                # other_stuff = all_ids[child["id"]]
                print "Perform some task here"


更新：如果树0
中的“ID”不是唯一的，即如果您有多行具有相同ID，您可以使用，例如，将ID映射到其他属性列表，如下所示
with open('tree_0') as f:
    all_ids = collections.defaultdict(list)
    for line in f:
        items = line.split('\t')
        all_ids[int(items[0])].append(items)

然后，在代码的另一部分，只需对列表中的所有条目执行任务：
            if child["id"] in all_ids:
                for other_stuff in all_ids[child["id"]]:
                    print "Perform some task here", other_stuff

问题可能更适用于-鉴于这些文件被称为树
，人们几乎会怀疑它们已经具有某种类型的键排序结构。不，但是，我必须使用其他字段（对应于数据[0]的列）进一步。非常感谢。这似乎要快得多。当我运行代码时，关键字有重复条目。如何在上面的编辑中删除重复条目。@Saurabh你的确切意思是什么？在树0
中有多行具有相同ID的行，还是在树中有多个具有相同ID的条目。json
？一个d在后一种情况下，您希望每个行
，还是所有树都有唯一的条目。json？我在树0中有多行具有相同id。
            if child["id"] in all_ids:
                for other_stuff in all_ids[child["id"]]:
                    print "Perform some task here", other_stuff