Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 优化reed-solomon编码器(多项式除法)_Python_Numpy_Optimization_Cython_Pypy - Fatal编程技术网

Python 优化reed-solomon编码器(多项式除法)

Python 优化reed-solomon编码器(多项式除法),python,numpy,optimization,cython,pypy,Python,Numpy,Optimization,Cython,Pypy,我试图优化一个Reed-Solomon编码器,它实际上只是Galois字段2^8上的一个多项式除法运算(这意味着值的长度超过255)。事实上,代码与Go的代码非常相似: 这里使用的多项式除法是a(也称为霍纳法) 我什么都试过了:小矮人,小矮人,小天鹅。我获得的最佳性能是通过将pypy与以下简单的嵌套循环一起使用: def rsenc(msg_in, nsym, gen): '''Reed-Solomon encoding using polynomial division, better

我试图优化一个Reed-Solomon编码器,它实际上只是Galois字段2^8上的一个多项式除法运算(这意味着值的长度超过255)。事实上,代码与Go的代码非常相似:

这里使用的多项式除法是a(也称为霍纳法)

我什么都试过了:小矮人,小矮人,小天鹅。我获得的最佳性能是通过将pypy与以下简单的嵌套循环一起使用:

def rsenc(msg_in, nsym, gen):
    '''Reed-Solomon encoding using polynomial division, better explained at http://research.swtch.com/field'''
    msg_out = bytearray(msg_in) + bytearray(len(gen)-1)
    lgen = bytearray([gf_log[gen[j]] for j in xrange(len(gen))])

    for i in xrange(len(msg_in)):
        coef = msg_out[i]
        # coef = gf_mul(msg_out[i], gf_inverse(gen[0]))  // for general polynomial division (when polynomials are non-monic), we need to compute: coef = msg_out[i] / gen[0]
        if coef != 0: # coef 0 is normally undefined so we manage it manually here (and it also serves as an optimization btw)
            lcoef = gf_log[coef] # precaching

            for j in xrange(1, len(gen)): # optimization: can skip g0 because the first coefficient of the generator is always 1! (that's why we start at position 1)
                msg_out[i + j] ^= gf_exp[lcoef + lgen[j]] # equivalent (in Galois Field 2^8) to msg_out[i+j] += msg_out[i] * gen[j]

    # Recopy the original message bytes
    msg_out[:len(msg_in)] = msg_in
    return msg_out
Python优化向导能给我一些关于如何获得加速的线索吗?我的目标是获得至少3倍的加速,但更多的将是可怕的。任何方法或工具都可以接受,只要是跨平台的(至少可以在Linux和Windows上使用)

下面是一个小测试脚本,其中包含我尝试过的其他一些替代方案(不包括cython尝试,因为它比本机python慢!):

(注意:备选方案应该是正确的,某些索引必须有点偏离,但由于它们速度较慢,我没有尝试修复它们)

/赏金的更新和目标:我发现了一个非常有趣的优化技巧,可以大大加快计算速度:到。我用新函数rsenc_precomp()更新了上面的代码。但是,在我的实现中没有任何收益,甚至有点慢:

rsenc : total time elapsed 0.107170 seconds.
rsenc_precomp : total time elapsed 0.108788 seconds.
数组查找的成本怎么会比加法或xor之类的操作更高?为什么它在ZFEC中工作而在Python中不工作?


我将把这笔赏金归功于谁,谁能告诉我如何使这个乘法/加法查找表优化工作(比异或和加法运算更快),谁能通过引用或分析向我解释为什么这个优化不能在这里工作(使用Python/pypypypy/Cython/Numpy等。我都试过了).

在我的机器上,以下速度比pypy快3倍(0.04秒比0.15秒)。使用Cython:

ctypedef unsigned char uint8_t # does not work with Microsoft's C Compiler: from libc.stdint cimport uint8_t
cimport cpython.array as array

cdef uint8_t[::1] gf_exp = bytearray([1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19,
   lots of numbers omitted for space reasons
   ...])

cdef uint8_t[::1] gf_log = bytearray([0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 
    more numbers omitted for space reasons
    ...])

import cython

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.initializedcheck(False)
def rsenc(msg_in_r, nsym, gen_t):
    '''Reed-Solomon encoding using polynomial division, better explained at http://research.swtch.com/field'''

    cdef uint8_t[::1] msg_in = bytearray(msg_in_r) # have to copy, unfortunately - can't make a memory view from a read only object
    cdef int[::1] gen = array.array('i',gen_t) # convert list to array

    cdef uint8_t[::1] msg_out = bytearray(msg_in) + bytearray(len(gen)-1)
    cdef int j
    cdef uint8_t[::1] lgen = bytearray(gen.shape[0])
    for j in xrange(gen.shape[0]):
        lgen[j] = gf_log[gen[j]]

    cdef uint8_t coef,lcoef

    cdef int i
    for i in xrange(msg_in.shape[0]):
        coef = msg_out[i]
        if coef != 0: # coef 0 is normally undefined so we manage it manually here (and it also serves as an optimization btw)
            lcoef = gf_log[coef] # precaching

            for j in xrange(1, gen.shape[0]): # optimization: can skip g0 because the first coefficient of the generator is always 1! (that's why we start at position 1)
                msg_out[i + j] ^= gf_exp[lcoef + lgen[j]] # equivalent (in Galois Field 2^8) to msg_out[i+j] -= msg_out[i] * gen[j]

    # Recopy the original message bytes
    msg_out[:msg_in.shape[0]] = msg_in
    return msg_out
这是一个包含静态类型的最快版本(并从
cython-a
检查html,直到循环没有以黄色突出显示)

以下是一些简要说明:

  • Cython更喜欢
    x.shape[0]
    而不是
    len(shape)

  • 将MemoryView定义为
    [::1]
    可以保证它们在内存中是连续的,这很有帮助

  • initializedcheck(False)
    是避免对全局定义的
    gf\u exp
    gf\u log
    进行大量存在性检查的必要条件。(您可能会发现,通过为这些代码创建一个局部变量引用并使用该istead,可以加快基本Python/PyPy代码的速度)

  • 我不得不复制几个输入参数。Cython无法从只读对象生成memoryview(在本例中为
    msg\u in
    ,一个字符串。不过,我可能只是将其设置为char*)。另外,
    gen
    (一个列表)需要位于具有快速元素访问的内容中


除此之外,一切都相当直截了当。(我没有尝试过它的任何变体,因为它速度更快)。PyPy的出色表现给我留下了深刻的印象。

或者,如果您了解C,我建议您用普通C重写这个Python函数并调用它(比如使用CFFI)。至少您知道,在不需要知道PyPy或Cython技巧的情况下,您在函数的内部循环中达到了最高性能


请参阅:

基于DavidW的答案,以下是我目前使用的实现,使用nogil和并行计算大约快20%:

from cython.parallel import parallel, prange

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.initializedcheck(False)
cdef rsenc_cython(msg_in_r, nsym, gen_t) :
    '''Reed-Solomon encoding using polynomial division, better explained at http://research.swtch.com/field'''

    cdef uint8_t[::1] msg_in = bytearray(msg_in_r) # have to copy, unfortunately - can't make a memory view from a read only object
    #cdef int[::1] gen = array.array('i',gen_t) # convert list to array
    cdef uint8_t[::1] gen = gen_t

    cdef uint8_t[::1] msg_out = bytearray(msg_in) + bytearray(len(gen)-1)
    cdef int i, j
    cdef uint8_t[::1] lgen = bytearray(gen.shape[0])
    for j in xrange(gen.shape[0]):
        lgen[j] = gf_log_c[gen[j]]

    cdef uint8_t coef,lcoef
    with nogil:
        for i in xrange(msg_in.shape[0]):
            coef = msg_out[i]
            if coef != 0: # coef 0 is normally undefined so we manage it manually here (and it also serves as an optimization btw)
                lcoef = gf_log_c[coef] # precaching

                for j in prange(1, gen.shape[0]): # optimization: can skip g0 because the first coefficient of the generator is always 1! (that's why we start at position 1)
                    msg_out[i + j] ^= gf_exp_c[lcoef + lgen[j]] # equivalent (in Galois Field 2^8) to msg_out[i+j] -= msg_out[i] * gen[j]

    # Recopy the original message bytes
    msg_out[:msg_in.shape[0]] = msg_in
    return msg_out
我仍然希望它更快(在实际实现中,数据的编码速度约为6.4 MB/s,n=255,n是消息+码字的大小)

我发现更快实现的主要原因是使用LUT(查找表)方法,通过预计算乘法和加法数组。然而,在我的Python和Cython实现中,LUT方法比计算XOR和加法运算慢

还有其他方法可以实现更快的RS编码器,但我没有能力也没有时间去尝试。我将把它们作为其他感兴趣的读者的参考:

  • “有限域运算的快速软件实现”,程晃和徐立豪,华盛顿大学圣路易斯技术代表(2003年)。以及正确的代码实现
  • 罗建强,等。“安全存储应用中大有限域GF(2N)的高效软件实现”,《存储上的ACM事务》(TOS)8.1(2012):2
  • “用于存储的开放源代码擦除编码库的性能评估和检查”,,Plank,J.S.和Luo,J.和Schuman,C.D.和Xu,L.,以及Wilcox-O'Hearn,Z,FAST。第9卷。2009 或者是非扩展版本:“存储应用程序开源擦除编码库的性能比较”,Plank和Schuman
  • ZFEC库的源代码,带有乘法LUT优化
  • “Reed-Solomon编码器的优化算法”,Christof Paar(1997年6月)。在IEEE信息论国际研讨会上(第250-250页)。电气工程师协会(IEEE)
  • “在GF(2^8)上编码(255233)Reed-Solomon码的快速算法”,R.L.Miller和T.K.Truong,I.S.Reed
  • “针对不同处理器架构和应用优化伽罗瓦场算法”,Greenan,Kevin和M.,Ethan和L.Miller和Thomas JE Schwarz,计算机和电信系统的建模、分析和仿真,2008年。吉祥物2008。IEEE国际研讨会。IEEE,2008年
  • 安文,H.彼得。《RAID-6的数学》(2007)。及
  • ,是Cauchy-Reed-Solomon仅有的几个实现之一,据说速度非常快
  • “并行多项式除法的对数布尔时间算法”,比尼,D.和潘,V.Y.(1987),信息处理信函,24(4),233-237。另见Bini,D.和V.Pan。“快
    ctypedef unsigned char uint8_t # does not work with Microsoft's C Compiler: from libc.stdint cimport uint8_t
    cimport cpython.array as array
    
    cdef uint8_t[::1] gf_exp = bytearray([1, 3, 5, 15, 17, 51, 85, 255, 26, 46, 114, 150, 161, 248, 19,
       lots of numbers omitted for space reasons
       ...])
    
    cdef uint8_t[::1] gf_log = bytearray([0, 0, 25, 1, 50, 2, 26, 198, 75, 199, 27, 104, 
        more numbers omitted for space reasons
        ...])
    
    import cython
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.initializedcheck(False)
    def rsenc(msg_in_r, nsym, gen_t):
        '''Reed-Solomon encoding using polynomial division, better explained at http://research.swtch.com/field'''
    
        cdef uint8_t[::1] msg_in = bytearray(msg_in_r) # have to copy, unfortunately - can't make a memory view from a read only object
        cdef int[::1] gen = array.array('i',gen_t) # convert list to array
    
        cdef uint8_t[::1] msg_out = bytearray(msg_in) + bytearray(len(gen)-1)
        cdef int j
        cdef uint8_t[::1] lgen = bytearray(gen.shape[0])
        for j in xrange(gen.shape[0]):
            lgen[j] = gf_log[gen[j]]
    
        cdef uint8_t coef,lcoef
    
        cdef int i
        for i in xrange(msg_in.shape[0]):
            coef = msg_out[i]
            if coef != 0: # coef 0 is normally undefined so we manage it manually here (and it also serves as an optimization btw)
                lcoef = gf_log[coef] # precaching
    
                for j in xrange(1, gen.shape[0]): # optimization: can skip g0 because the first coefficient of the generator is always 1! (that's why we start at position 1)
                    msg_out[i + j] ^= gf_exp[lcoef + lgen[j]] # equivalent (in Galois Field 2^8) to msg_out[i+j] -= msg_out[i] * gen[j]
    
        # Recopy the original message bytes
        msg_out[:msg_in.shape[0]] = msg_in
        return msg_out
    
    from cython.parallel import parallel, prange
    
    @cython.boundscheck(False)
    @cython.wraparound(False)
    @cython.initializedcheck(False)
    cdef rsenc_cython(msg_in_r, nsym, gen_t) :
        '''Reed-Solomon encoding using polynomial division, better explained at http://research.swtch.com/field'''
    
        cdef uint8_t[::1] msg_in = bytearray(msg_in_r) # have to copy, unfortunately - can't make a memory view from a read only object
        #cdef int[::1] gen = array.array('i',gen_t) # convert list to array
        cdef uint8_t[::1] gen = gen_t
    
        cdef uint8_t[::1] msg_out = bytearray(msg_in) + bytearray(len(gen)-1)
        cdef int i, j
        cdef uint8_t[::1] lgen = bytearray(gen.shape[0])
        for j in xrange(gen.shape[0]):
            lgen[j] = gf_log_c[gen[j]]
    
        cdef uint8_t coef,lcoef
        with nogil:
            for i in xrange(msg_in.shape[0]):
                coef = msg_out[i]
                if coef != 0: # coef 0 is normally undefined so we manage it manually here (and it also serves as an optimization btw)
                    lcoef = gf_log_c[coef] # precaching
    
                    for j in prange(1, gen.shape[0]): # optimization: can skip g0 because the first coefficient of the generator is always 1! (that's why we start at position 1)
                        msg_out[i + j] ^= gf_exp_c[lcoef + lgen[j]] # equivalent (in Galois Field 2^8) to msg_out[i+j] -= msg_out[i] * gen[j]
    
        # Recopy the original message bytes
        msg_out[:msg_in.shape[0]] = msg_in
        return msg_out