python:改进性能和/或方法,以避免创建、保存和删除变量时出现内存错误

python:改进性能和/或方法,以避免创建、保存和删除变量时出现内存错误,python,performance,variables,memory,save,Python,Performance,Variables,Memory,Save,我一直在反对一个函数给我一个内存错误,多亏了你的支持()我设法解决了这个问题;但是,由于我不是专业程序员,我想征求您对我的方法以及如何提高其性能(如果可能)的意见 该函数是一个生成函数,返回n节点有向图的所有循环。然而,对于12节点有向图,大约有1.15亿个循环(每个循环定义为节点列表,例如,[0,1,2,0]是一个循环)。我需要所有可用于进一步处理的循环,即使在我提取了它们第一次生成时的一些属性之后,所以它们需要存储在某个地方。因此,我们的想法是每1000万个周期剪切一次结果数组,以避免内存错

我一直在反对一个函数给我一个内存错误,多亏了你的支持()我设法解决了这个问题;但是,由于我不是专业程序员,我想征求您对我的方法以及如何提高其性能(如果可能)的意见

该函数是一个生成函数,返回n节点有向图的所有循环。然而,对于12节点有向图,大约有1.15亿个循环(每个循环定义为节点列表,例如,[0,1,2,0]是一个循环)。我需要所有可用于进一步处理的循环,即使在我提取了它们第一次生成时的一些属性之后,所以它们需要存储在某个地方。因此,我们的想法是每1000万个周期剪切一次结果数组,以避免内存错误(当数组太大时,python会耗尽RAM),并创建一个新数组来存储以下结果。在12节点有向图中,我将有12个结果数组,11个完整数组(每个包含1000万个循环),最后一个包含500万个循环

但是,分割结果数组是不够的,因为变量留在RAM中。因此,我仍然需要将每个文件写入磁盘,然后将其删除以清除RAM

如中所述,使用“exec”创建变量名不是很“干净”,字典解决方案更好。然而,在我的例子中,如果我将结果存储在一个字典中,它将由于数组的大小而耗尽内存。因此,我选择了“执行”方式。如果你能对这一决定发表意见,我将不胜感激

此外,为了存储阵列,我使用numpy.savez_compressed,它为每1000万个周期的阵列提供了43 Mb的文件。如果未压缩,则会创建一个500 Mb的文件。但是,使用压缩版本会减慢写入过程。你知道如何加快书写和/或压缩过程吗

我编写的代码的简化版本如下:

nbr_result_arrays=0
result_array_0=[]
result_lenght=10000000
tmp=result_array_0 # I use tmp to avoid using exec within the for loop (exec slows down code execution) 
for cycle in generator:
    tmp.append(cycle)    
    if len(tmp) == result_lenght:
        exec 'np.savez_compressed(\'results_' +str(nbr_result_arrays)+ '\', tmp)'
        exec 'del result_array_'+str(nbr_result_arrays)
        nbr_result_arrays+=1
        exec 'result_array_'+str(nbr_result_arrays)+'=[]'
        exec 'tmp=result_array_'+str(nbr_result_arrays)
谢谢你的阅读

Aleix

如何使用


谢谢大家的建议

正如@Aya所建议的,我认为为了提高性能(以及可能的空间问题),我应该避免将结果存储在HD上,因为存储结果比创建结果要花费一半的时间,因此再次加载和处理结果将非常接近再次创建结果。此外,如果我不存储任何结果,我会节省空间,这对于较大的有向图来说可能会成为一个大问题(12节点完整有向图约有1.15亿个循环,而29节点完整有向图约有848E27个循环…并以阶乘速率增加)

我的想法是,我首先需要找出所有通过最弱弧的循环,找到所有通过它的循环的总概率。然后,使用这个总概率,我必须再次通过所有这些循环,根据加权概率从原始数组中减去它们(我需要总概率来计算加权概率:加权概率=这个循环的概率/通过这个边的总概率)

因此,我认为这是最好的方法(但我愿意接受更多的讨论!)

但是,我对两个子功能的速度处理有疑问:

  • 第一:查找序列是否包含特定(较小)序列。我是用函数“contains_sequence”来实现的,它依赖于生成器函数“window”(如中所建议的),但是我被告知用deque来实现会快33%。还有其他想法吗

  • 第二:我目前通过滑动循环节点(由列表表示)来查找循环的循环概率,以查找每个弧的输出处停留在循环内的概率,然后将它们全部相乘以查找循环概率(函数名称为find_cycle_probability)。如果您对该功能有任何性能建议,我将不胜感激,因为我需要在每个循环中运行它,即无数次

欢迎提供任何其他提示/建议/评论!再次感谢您的帮助

阿莱克斯

以下是简化代码:

def simple_cycles_generator_w_filters(working_array_digraph, arc):
    '''Generator function generating all cycles containing a specific arc.'''
    generator=new_cycles.simple_cycles_generator(working_array_digraph)
    for cycle in generator:
        if contains_sequence(cycle, arc):             
            yield cycle
    return

def find_smallest_arc_with_cycle(working_array,working_array_digraph):
    '''Find the smallest arc through which at least one cycle flows.
    Returns:
        - if such arc exist:
            smallest_arc_with_cycle = [a,b] where a is the start of arc and b the end
            smallest_arc_with_cycle_value = x where x is the weight of the arc
        - if such arc does not exist:
            smallest_arc_with_cycle = []
            smallest_arc_with_cycle_value = 0 '''
    smallest_arc_with_cycle = []
    smallest_arc_with_cycle_value = 0
    sparse_array = []
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] !=0:
                sparse_array.append([i,j,working_array[i][j]])
    sorted_array=sorted(sparse_array, key=lambda x: x[2])
    for i in range(len(sorted_array)):
        smallest_arc=[sorted_array[i][0],sorted_array[i][1]]
        generator=simple_cycles_generator_w_filters(working_array_digraph,smallest_arc)
        if any(generator):
            smallest_arc_with_cycle=smallest_arc
            smallest_arc_with_cycle_value=sorted_array[i][2]
            break

    return smallest_arc_with_cycle,smallest_arc_with_cycle_value

def window(seq, n=2):
    """Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... """
    it = iter(seq)
    result = list(itertools.islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))


def find_cycle_probability(cycle, working_array, total_outputs):
    '''Finds the cycle probability of a given cycle within a given array'''
    output_prob_of_each_arc=[]
    for i in range(len(cycle)-1):
        weight_of_the_arc=working_array[cycle[i]][cycle[i+1]]
        output_probability_of_the_arc=float(weight_of_the_arc)/float(total_outputs[cycle[i]])#NOTE:total_outputs is an array, thus the float
        output_prob_of_each_arc.append(output_probability_of_the_arc)
    circuit_probabilities_of_the_cycle=numpy.prod(output_prob_of_each_arc)    
    return circuit_probabilities_of_the_cycle 

def clean_negligible_values(working_array):
    ''' Cleans the array by rounding negligible values to 0 according to a 
    pre-defined threeshold.'''
    zero_threeshold=0.000001
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] == 0:
                continue
            elif 0 < working_array[i][j] < zero_threeshold:
                working_array[i][j] = 0
            elif -zero_threeshold <= working_array[i][j] < 0:
                working_array[i][j] = 0
            elif working_array[i][j] < -zero_threeshold:
                sys.exit('Error')    
    return working_array

original_array= 1000 * numpy.random.random_sample((5, 5))
total_outputs=numpy.sum(original_array,axis=0) + 100 * numpy.random.random_sample(5)

working_array=original_array.__copy__() 
straight_array= working_array.__copy__() 
cycle_array=numpy.zeros(numpy.shape(working_array))
iteration_counter=0
working_array_digraph=networkx.DiGraph(working_array)

[smallest_arc_with_cycle, smallest_arc_with_cycle_value]= find_smallest_arc_with_cycle(working_array, working_array_digraph) 

while smallest_arc_with_cycle: # using implicit true value of a non-empty list

    cycle_flows_to_be_subtracted = numpy.zeros(numpy.shape((working_array)))

    # FIRST run of the generator to calculate each cycle probability
    # note: the cycle generator ONLY provides all cycles going through 
    # the specified weakest arc    
    generator = simple_cycles_generator_w_filters(working_array_digraph, smallest_arc_with_cycle)
    nexus_total_probs = 0
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)
        nexus_total_probs += cycle_prob

    # SECOND run of the generator
    # using the nexus_prob_sum calculated before, I can allocate the weight of the 
    # weakest arc to each cycle going through it
    generator = simple_cycles_generator_w_filters(working_array_digraph,smallest_arc_with_cycle)
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)        
        allocated_cycle_weight = cycle_prob / nexus_total_probs * smallest_arc_with_cycle_value
        # create the array to be substracted
        for i in range(len(cycle)-1):
            cycle_flows_to_be_subtracted[cycle[i]][cycle[i+1]] += allocated_cycle_weight 

    working_array = working_array - cycle_flows_to_be_subtracted
    clean_negligible_values(working_array)    
    cycle_array = cycle_array + cycle_flows_to_be_subtracted   
    straight_array = straight_array - cycle_flows_to_be_subtracted
    clean_negligible_values(straight_array)
    # find the next weakest arc with cycles.
    working_array_digraph=networkx.DiGraph(working_array)
    [smallest_arc_with_cycle, smallest_arc_with_cycle_value] = find_smallest_arc_with_cycle(working_array,working_array_digraph)
def simple_cycles_generator_w_filters(工作阵列有向图,arc):
“生成包含特定弧的所有循环的生成器函数”
生成器=新的\u循环。简单的\u循环\u生成器(工作的\u数组\u有向图)
对于发电机中的循环:
如果包含_序列(循环、弧):
产量周期
返回
def查找具有循环的最小弧(工作数组、工作数组有向图):
''找到至少有一个循环通过的最小弧。
返回:
-如果存在此类弧:
最小_弧_,_循环=[a,b],其中a是弧的起点,b是终点
最小_弧_,_循环_值=x,其中x是弧的重量
-如果该弧不存在:
最小_弧_,_循环=[]
_循环值为0''的最小_弧
最小_弧_,_循环=[]
_循环值为0的最小_弧_
稀疏_数组=[]
对于范围内的i(numpy.shape(工作数组)[0]):
对于范围内的j(numpy.shape(工作数组)[1]):
如果工作_数组[i][j]!=0:
稀疏数组.append([i,j,工作数组[i][j]]
排序数组=排序(稀疏数组,key=lambda x:x[2])
对于范围内的i(len(排序数组)):
最小_弧=[排序的_数组[i][0],排序的_数组[i][1]]
生成器=简单周期生成器w过滤器(工作数组有向图,最小弧)
如有(发电机):
最小_弧(带_循环)=最小_弧
最小_弧_,_循环_值=排序的_数组[i][2]
打破
返回带循环的最小圆弧,带循环值的最小圆弧
def窗口(序号,n=2):
“”“在iterable中的数据上返回一个滑动窗口(宽度为n)。”
s->(s0,s1,…s[n-1]),(s1,s2,…,sn),…”
it=国际热核实验堆(序号)
结果=列表(itertools.islice)
def simple_cycles_generator_w_filters(working_array_digraph, arc):
    '''Generator function generating all cycles containing a specific arc.'''
    generator=new_cycles.simple_cycles_generator(working_array_digraph)
    for cycle in generator:
        if contains_sequence(cycle, arc):             
            yield cycle
    return

def find_smallest_arc_with_cycle(working_array,working_array_digraph):
    '''Find the smallest arc through which at least one cycle flows.
    Returns:
        - if such arc exist:
            smallest_arc_with_cycle = [a,b] where a is the start of arc and b the end
            smallest_arc_with_cycle_value = x where x is the weight of the arc
        - if such arc does not exist:
            smallest_arc_with_cycle = []
            smallest_arc_with_cycle_value = 0 '''
    smallest_arc_with_cycle = []
    smallest_arc_with_cycle_value = 0
    sparse_array = []
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] !=0:
                sparse_array.append([i,j,working_array[i][j]])
    sorted_array=sorted(sparse_array, key=lambda x: x[2])
    for i in range(len(sorted_array)):
        smallest_arc=[sorted_array[i][0],sorted_array[i][1]]
        generator=simple_cycles_generator_w_filters(working_array_digraph,smallest_arc)
        if any(generator):
            smallest_arc_with_cycle=smallest_arc
            smallest_arc_with_cycle_value=sorted_array[i][2]
            break

    return smallest_arc_with_cycle,smallest_arc_with_cycle_value

def window(seq, n=2):
    """Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... """
    it = iter(seq)
    result = list(itertools.islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))


def find_cycle_probability(cycle, working_array, total_outputs):
    '''Finds the cycle probability of a given cycle within a given array'''
    output_prob_of_each_arc=[]
    for i in range(len(cycle)-1):
        weight_of_the_arc=working_array[cycle[i]][cycle[i+1]]
        output_probability_of_the_arc=float(weight_of_the_arc)/float(total_outputs[cycle[i]])#NOTE:total_outputs is an array, thus the float
        output_prob_of_each_arc.append(output_probability_of_the_arc)
    circuit_probabilities_of_the_cycle=numpy.prod(output_prob_of_each_arc)    
    return circuit_probabilities_of_the_cycle 

def clean_negligible_values(working_array):
    ''' Cleans the array by rounding negligible values to 0 according to a 
    pre-defined threeshold.'''
    zero_threeshold=0.000001
    for i in range(numpy.shape(working_array)[0]):
        for j in range(numpy.shape(working_array)[1]):
            if working_array[i][j] == 0:
                continue
            elif 0 < working_array[i][j] < zero_threeshold:
                working_array[i][j] = 0
            elif -zero_threeshold <= working_array[i][j] < 0:
                working_array[i][j] = 0
            elif working_array[i][j] < -zero_threeshold:
                sys.exit('Error')    
    return working_array

original_array= 1000 * numpy.random.random_sample((5, 5))
total_outputs=numpy.sum(original_array,axis=0) + 100 * numpy.random.random_sample(5)

working_array=original_array.__copy__() 
straight_array= working_array.__copy__() 
cycle_array=numpy.zeros(numpy.shape(working_array))
iteration_counter=0
working_array_digraph=networkx.DiGraph(working_array)

[smallest_arc_with_cycle, smallest_arc_with_cycle_value]= find_smallest_arc_with_cycle(working_array, working_array_digraph) 

while smallest_arc_with_cycle: # using implicit true value of a non-empty list

    cycle_flows_to_be_subtracted = numpy.zeros(numpy.shape((working_array)))

    # FIRST run of the generator to calculate each cycle probability
    # note: the cycle generator ONLY provides all cycles going through 
    # the specified weakest arc    
    generator = simple_cycles_generator_w_filters(working_array_digraph, smallest_arc_with_cycle)
    nexus_total_probs = 0
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)
        nexus_total_probs += cycle_prob

    # SECOND run of the generator
    # using the nexus_prob_sum calculated before, I can allocate the weight of the 
    # weakest arc to each cycle going through it
    generator = simple_cycles_generator_w_filters(working_array_digraph,smallest_arc_with_cycle)
    for cycle in generator:
        cycle_prob = find_cycle_probability(cycle, working_array, total_outputs)        
        allocated_cycle_weight = cycle_prob / nexus_total_probs * smallest_arc_with_cycle_value
        # create the array to be substracted
        for i in range(len(cycle)-1):
            cycle_flows_to_be_subtracted[cycle[i]][cycle[i+1]] += allocated_cycle_weight 

    working_array = working_array - cycle_flows_to_be_subtracted
    clean_negligible_values(working_array)    
    cycle_array = cycle_array + cycle_flows_to_be_subtracted   
    straight_array = straight_array - cycle_flows_to_be_subtracted
    clean_negligible_values(straight_array)
    # find the next weakest arc with cycles.
    working_array_digraph=networkx.DiGraph(working_array)
    [smallest_arc_with_cycle, smallest_arc_with_cycle_value] = find_smallest_arc_with_cycle(working_array,working_array_digraph)