python：改进性能和/或方法，以避免创建、保存和删除变量时出现内存错误_Python_Performance_Variables_Memory_Save

python：改进性能和/或方法，以避免创建、保存和删除变量时出现内存错误

python performance variables memory

python：改进性能和/或方法，以避免创建、保存和删除变量时出现内存错误,python,performance,variables,memory,save,Python,Performance,Variables,Memory,Save,我一直在反对一个函数给我一个内存错误，多亏了你的支持（）我设法解决了这个问题；但是，由于我不是专业程序员，我想征求您对我的方法以及如何提高其性能（如果可能）的意见该函数是一个生成函数，返回n节点有向图的所有循环。然而，对于12节点有向图，大约有1.15亿个循环（每个循环定义为节点列表，例如，[0,1,2,0]是一个循环）。我需要所有可用于进一步处理的循环，即使在我提取了它们第一次生成时的一些属性之后，所以它们需要存储在某个地方。因此，我们的想法是每1000万个周期剪切一次结果数组，以避免内存错

我一直在反对一个函数给我一个内存错误，多亏了你的支持（）我设法解决了这个问题；但是，由于我不是专业程序员，我想征求您对我的方法以及如何提高其性能（如果可能）的意见

该函数是一个生成函数，返回n节点有向图的所有循环。然而，对于12节点有向图，大约有1.15亿个循环（每个循环定义为节点列表，例如，[0,1,2,0]是一个循环）。我需要所有可用于进一步处理的循环，即使在我提取了它们第一次生成时的一些属性之后，所以它们需要存储在某个地方。因此，我们的想法是每1000万个周期剪切一次结果数组，以避免内存错误（当数组太大时，python会耗尽RAM），并创建一个新数组来存储以下结果。在12节点有向图中，我将有12个结果数组，11个完整数组（每个包含1000万个循环），最后一个包含500万个循环

但是，分割结果数组是不够的，因为变量留在RAM中。因此，我仍然需要将每个文件写入磁盘，然后将其删除以清除RAM

如中所述，使用“exec”创建变量名不是很“干净”，字典解决方案更好。然而，在我的例子中，如果我将结果存储在一个字典中，它将由于数组的大小而耗尽内存。因此，我选择了“执行”方式。如果你能对这一决定发表意见，我将不胜感激

此外，为了存储阵列，我使用numpy.savez_compressed，它为每1000万个周期的阵列提供了43 Mb的文件。如果未压缩，则会创建一个500 Mb的文件。但是，使用压缩版本会减慢写入过程。你知道如何加快书写和/或压缩过程吗

我编写的代码的简化版本如下：

nbr_result_arrays=0
result_array_0=[]
result_lenght=10000000
tmp=result_array_0 # I use tmp to avoid using exec within the for loop (exec slows down code execution) 
for cycle in generator:
    tmp.append(cycle)    
    if len(tmp) == result_lenght:
        exec 'np.savez_compressed(\'results_' +str(nbr_result_arrays)+ '\', tmp)'
        exec 'del result_array_'+str(nbr_result_arrays)
        nbr_result_arrays+=1
        exec 'result_array_'+str(nbr_result_arrays)+'=[]'
        exec 'tmp=result_array_'+str(nbr_result_arrays)

谢谢你的阅读

Aleix

如何使用

谢谢大家的建议

正如@Aya所建议的，我认为为了提高性能（以及可能的空间问题），我应该避免将结果存储在HD上，因为存储结果比创建结果要花费一半的时间，因此再次加载和处理结果将非常接近再次创建结果。此外，如果我不存储任何结果，我会节省空间，这对于较大的有向图来说可能会成为一个大问题（12节点完整有向图约有1.15亿个循环，而29节点完整有向图约有848E27个循环…并以阶乘速率增加）

我的想法是，我首先需要找出所有通过最弱弧的循环，找到所有通过它的循环的总概率。然后，使用这个总概率，我必须再次通过所有这些循环，根据加权概率从原始数组中减去它们（我需要总概率来计算加权概率：加权概率=这个循环的概率/通过这个边的总概率）

因此，我认为这是最好的方法（但我愿意接受更多的讨论！）

但是，我对两个子功能的速度处理有疑问：

第一：查找序列是否包含特定（较小）序列。我是用函数“contains_sequence”来实现的，它依赖于生成器函数“window”（如中所建议的），但是我被告知用deque来实现会快33%。还有其他想法吗
第二：我目前通过滑动循环节点（由列表表示）来查找循环的循环概率，以查找每个弧的输出处停留在循环内的概率，然后将它们全部相乘以查找循环概率（函数名称为find_cycle_probability）。如果您对该功能有任何性能建议，我将不胜感激，因为我需要在每个循环中运行它，即无数次

欢迎提供任何其他提示/建议/评论！再次感谢您的帮助

阿莱克斯

以下是简化代码：

def simple_cycles_generator_w_filters(working_array_digraph, arc):
    '''Generator function generating all cycles containing a specific arc.'''
    generator=new_cycles.simple_cycles_generator(working_array_digraph)
    for cycle in generator:
        if contains_sequence(cycle, arc):             
            yield cycle
    return

def find_smallest_arc_with_cycle(working_array,working_array_digraph):
    '''Find the smallest arc through which at least one cycle flows.
    Returns:
        - if such arc exist:
            smallest_arc_with_cycle = [a,b] where a is the start of arc and b the end
            smallest_arc_with_cycle_value = x where x is the weight of the arc
        - if such arc does not exist:
            smallest_arc_with_cycle = []
            smallest_arc_with_cycle_value = 0 '''
    smallest_arc_with_cycle = []
    smallest_arc_with_cycle_value = 0
    sparse_array = []
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] !=0:
                sparse_array.append([i,j,working_array[i][j]])
    sorted_array=sorted(sparse_array, key=lambda x: x[2])
    for i in range(len(sorted_array)):
        smallest_arc=[sorted_array[i][0],sorted_array[i][1]]
        generator=simple_cycles_generator_w_filters(working_array_digraph,smallest_arc)
        if any(generator):
            smallest_arc_with_cycle=smallest_arc
            smallest_arc_with_cycle_value=sorted_array[i][2]
            break

    return smallest_arc_with_cycle,smallest_arc_with_cycle_value

def window(seq, n=2):
    """Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... """
    it = iter(seq)
    result = list(itertools.islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))


def find_cycle_probability(cycle, working_array, total_outputs):
    '''Finds the cycle probability of a given cycle within a given array'''
    output_prob_of_each_arc=[]
    for i in range(len(cycle)-1):
        weight_of_the_arc=working_array[cycle[i]][cycle[i+1]]
        output_probability_of_the_arc=float(weight_of_the_arc)/float(total_outputs[cycle[i]])#NOTE:total_outputs is an array, thus the float
        output_prob_of_each_arc.append(output_probability_of_the_arc)
    circuit_probabilities_of_the_cycle=numpy.prod(output_prob_of_each_arc)    
    return circuit_probabilities_of_the_cycle 

def clean_negligible_values(working_array):
    ''' Cleans the array by rounding negligible values to 0 according to a 
    pre-defined threeshold.'''
    zero_threeshold=0.000001
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] == 0:
                continue
            elif 0 < working_array[i][j] < zero_threeshold:
                working_array[i][j] = 0
            elif -zero_threeshold <= working_array[i][j] < 0:
                working_array[i][j] = 0
            elif working_array[i][j] < -zero_threeshold:
                sys.exit('Error')    
    return working_array

original_array= 1000 * numpy.random.random_sample((5, 5))
total_outputs=numpy.sum(original_array,axis=0) + 100 * numpy.random.random_sample(5)

working_array=original_array.__copy__() 
straight_array= working_array.__copy__() 
cycle_array=numpy.zeros(numpy.shape(working_array))
iteration_counter=0
working_array_digraph=networkx.DiGraph(working_array)

[smallest_arc_with_cycle, smallest_arc_with_cycle_value]= find_smallest_arc_with_cycle(working_array, working_array_digraph) 

while smallest_arc_with_cycle: # using implicit true value of a non-empty list

    cycle_flows_to_be_subtracted = numpy.zeros(numpy.shape((working_array)))

    # FIRST run of the generator to calculate each cycle probability
    # note: the cycle generator ONLY provides all cycles going through 
    # the specified weakest arc    
    generator = simple_cycles_generator_w_filters(working_array_digraph, smallest_arc_with_cycle)
    nexus_total_probs = 0
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)
        nexus_total_probs += cycle_prob

    # SECOND run of the generator
    # using the nexus_prob_sum calculated before, I can allocate the weight of the 
    # weakest arc to each cycle going through it
    generator = simple_cycles_generator_w_filters(working_array_digraph,smallest_arc_with_cycle)
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)        
        allocated_cycle_weight = cycle_prob / nexus_total_probs * smallest_arc_with_cycle_value
        # create the array to be substracted
        for i in range(len(cycle)-1):
            cycle_flows_to_be_subtracted[cycle[i]][cycle[i+1]] += allocated_cycle_weight 

    working_array = working_array - cycle_flows_to_be_subtracted
    clean_negligible_values(working_array)    
    cycle_array = cycle_array + cycle_flows_to_be_subtracted   
    straight_array = straight_array - cycle_flows_to_be_subtracted
    clean_negligible_values(straight_array)
    # find the next weakest arc with cycles.
    working_array_digraph=networkx.DiGraph(working_array)
    [smallest_arc_with_cycle, smallest_arc_with_cycle_value] = find_smallest_arc_with_cycle(working_array,working_array_digraph)

def simple_cycles_generator_w_filters（工作阵列有向图，arc）：
“生成包含特定弧的所有循环的生成器函数”
生成器=新的\u循环。简单的\u循环\u生成器（工作的\u数组\u有向图）
对于发电机中的循环：
如果包含_序列（循环、弧）：
产量周期
返回
def查找具有循环的最小弧（工作数组、工作数组有向图）：
''找到至少有一个循环通过的最小弧。
返回：
-如果存在此类弧：
最小_弧_，_循环=[a，b]，其中a是弧的起点，b是终点
最小_弧_，_循环_值=x，其中x是弧的重量
-如果该弧不存在：
最小_弧_，_循环=[]
_循环值为0''的最小_弧
最小_弧_，_循环=[]
_循环值为0的最小_弧_
稀疏_数组=[]
对于范围内的i（numpy.shape（工作数组）[0]）：
对于范围内的j（numpy.shape（工作数组）[1]）：
如果工作_数组[i][j]！=0：
稀疏数组.append（[i，j，工作数组[i][j]]
排序数组=排序（稀疏数组，key=lambda x:x[2]）
对于范围内的i（len（排序数组））：
最小_弧=[排序的_数组[i][0]，排序的_数组[i][1]]
生成器=简单周期生成器w过滤器（工作数组有向图，最小弧）
如有（发电机）：
最小_弧（带_循环）=最小_弧
最小_弧_，_循环_值=排序的_数组[i][2]
打破
返回带循环的最小圆弧，带循环值的最小圆弧
def窗口（序号，n=2）：
“”“在iterable中的数据上返回一个滑动窗口（宽度为n）。”
s->（s0，s1，…s[n-1]），（s1，s2，…，sn），…”
it=国际热核实验堆（序号）
结果=列表（itertools.islice）
def simple_cycles_generator_w_filters(working_array_digraph, arc):
    '''Generator function generating all cycles containing a specific arc.'''
    generator=new_cycles.simple_cycles_generator(working_array_digraph)
    for cycle in generator:
        if contains_sequence(cycle, arc):             
            yield cycle
    return

def find_smallest_arc_with_cycle(working_array,working_array_digraph):
    '''Find the smallest arc through which at least one cycle flows.
    Returns:
        - if such arc exist:
            smallest_arc_with_cycle = [a,b] where a is the start of arc and b the end
            smallest_arc_with_cycle_value = x where x is the weight of the arc
        - if such arc does not exist:
            smallest_arc_with_cycle = []
            smallest_arc_with_cycle_value = 0 '''
    smallest_arc_with_cycle = []
    smallest_arc_with_cycle_value = 0
    sparse_array = []
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] !=0:
                sparse_array.append([i,j,working_array[i][j]])
    sorted_array=sorted(sparse_array, key=lambda x: x[2])
    for i in range(len(sorted_array)):
        smallest_arc=[sorted_array[i][0],sorted_array[i][1]]
        generator=simple_cycles_generator_w_filters(working_array_digraph,smallest_arc)
        if any(generator):
            smallest_arc_with_cycle=smallest_arc
            smallest_arc_with_cycle_value=sorted_array[i][2]
            break

    return smallest_arc_with_cycle,smallest_arc_with_cycle_value

def window(seq, n=2):
    """Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... """
    it = iter(seq)
    result = list(itertools.islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))


def find_cycle_probability(cycle, working_array, total_outputs):
    '''Finds the cycle probability of a given cycle within a given array'''
    output_prob_of_each_arc=[]
    for i in range(len(cycle)-1):
        weight_of_the_arc=working_array[cycle[i]][cycle[i+1]]
        output_probability_of_the_arc=float(weight_of_the_arc)/float(total_outputs[cycle[i]])#NOTE:total_outputs is an array, thus the float
        output_prob_of_each_arc.append(output_probability_of_the_arc)
    circuit_probabilities_of_the_cycle=numpy.prod(output_prob_of_each_arc)    
    return circuit_probabilities_of_the_cycle 

def clean_negligible_values(working_array):
    ''' Cleans the array by rounding negligible values to 0 according to a 
    pre-defined threeshold.'''
    zero_threeshold=0.000001
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] == 0:
                continue
            elif 0 < working_array[i][j] < zero_threeshold:
                working_array[i][j] = 0
            elif -zero_threeshold <= working_array[i][j] < 0:
                working_array[i][j] = 0
            elif working_array[i][j] < -zero_threeshold:
                sys.exit('Error')    
    return working_array

original_array= 1000 * numpy.random.random_sample((5, 5))
total_outputs=numpy.sum(original_array,axis=0) + 100 * numpy.random.random_sample(5)

working_array=original_array.__copy__() 
straight_array= working_array.__copy__() 
cycle_array=numpy.zeros(numpy.shape(working_array))
iteration_counter=0
working_array_digraph=networkx.DiGraph(working_array)

[smallest_arc_with_cycle, smallest_arc_with_cycle_value]= find_smallest_arc_with_cycle(working_array, working_array_digraph) 

while smallest_arc_with_cycle: # using implicit true value of a non-empty list

    cycle_flows_to_be_subtracted = numpy.zeros(numpy.shape((working_array)))

    # FIRST run of the generator to calculate each cycle probability
    # note: the cycle generator ONLY provides all cycles going through 
    # the specified weakest arc    
    generator = simple_cycles_generator_w_filters(working_array_digraph, smallest_arc_with_cycle)
    nexus_total_probs = 0
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)
        nexus_total_probs += cycle_prob

    # SECOND run of the generator
    # using the nexus_prob_sum calculated before, I can allocate the weight of the 
    # weakest arc to each cycle going through it
    generator = simple_cycles_generator_w_filters(working_array_digraph,smallest_arc_with_cycle)
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)        
        allocated_cycle_weight = cycle_prob / nexus_total_probs * smallest_arc_with_cycle_value
        # create the array to be substracted
        for i in range(len(cycle)-1):
            cycle_flows_to_be_subtracted[cycle[i]][cycle[i+1]] += allocated_cycle_weight 

    working_array = working_array - cycle_flows_to_be_subtracted
    clean_negligible_values(working_array)    
    cycle_array = cycle_array + cycle_flows_to_be_subtracted   
    straight_array = straight_array - cycle_flows_to_be_subtracted
    clean_negligible_values(straight_array)
    # find the next weakest arc with cycles.
    working_array_digraph=networkx.DiGraph(working_array)
    [smallest_arc_with_cycle, smallest_arc_with_cycle_value] = find_smallest_arc_with_cycle(working_array,working_array_digraph)