Python 我的lempel-zip实现使编码变得更长_Python_Compression

Python 我的lempel-zip实现使编码变得更长

python compression

Python 我的lempel-zip实现使编码变得更长,python,compression,Python,Compression,我不明白为什么我的实现会创建比输入更长的字符串它是根据本文件中的描述实施的，仅此描述它只是被设计成只作用于二进制字符串。如果有人能解释为什么这会产生一个比它开始时更长的字符串，我将非常感谢主编码 def LZ_encode(uncompressed): m=uncompressed dictionary=dict_gen(m) list=[int(bin(i)[2:]) for i in range(1,len(dictionary))] pointer_b

我不明白为什么我的实现会创建比输入更长的字符串

它是根据本文件中的描述实施的，仅此描述

它只是被设计成只作用于二进制字符串。如果有人能解释为什么这会产生一个比它开始时更长的字符串，我将非常感谢

主编码

def LZ_encode(uncompressed):
    m=uncompressed
    dictionary=dict_gen(m)
    list=[int(bin(i)[2:]) for i in range(1,len(dictionary))]
    pointer_bit=[]
    for k in list:
        pointer_bit=pointer_bit+[(str(chopped_lookup(k,dictionary)),dictionary[k][-1])]
    new_pointer_bit=pointer_length_correct(pointer_bit)
    list_output=[i for sub in new_pointer_bit for i in sub]
    if list_output[-1]=='$':
        output=''.join(list_output[:-1])
    else:
        output=''.join(list_output)
    return output

组件功能

def dict_gen(m): # Generates Dictionary
    dictionary={0:""}
    j=1
    w=""
    iterator=0
    l=len(m)
    for c in m:
        iterator+=1
        wc= str(str(w) + str(c))
        if wc in dictionary.values():
            w=wc
            if iterator==l:
                dictionary.update({int(bin(j)[2:]): wc+'$'})
        else:
            dictionary.update({int(bin(j)[2:]): wc})
            w=""
            j+=1
    return dictionary

def chopped_lookup(k,dictionary): # Returns entry number of shortened source string
    cut_source_string=dictionary[k][:-1]
    for key, value in dictionary.iteritems():
        if value == cut_source_string:
            return key
def pointer_length_correct(lst): # Takes the (pointer,bit) list and corrects the lenth of the pointer
    new_pointer_bit=[]
    for pair in lst:
        n=lst.index(pair)
        if len(str(pair[0]))>ceil(log(n+1,2)):
            while len(str(pair[0]))!=ceil(log(n+1,2)):
                pair = (str(pair[0])[1:],pair[1])
        if len(str(pair[0]))<ceil(log(n+1,2)):
            while len(str(pair[0]))!=ceil(log(n+1,2)):
                pair = (str('0'+str(pair[0])),pair[1])
        new_pointer_bit=new_pointer_bit+[pair]
    return new_pointer_bit

def dict_gen（m）：#生成字典
字典={0:：}
j=1
w=“”
迭代器=0
l=长度（m）
对于m中的c：
迭代器+=1
wc=str（str（w）+str（c））
如果wc在dictionary.values（）中：
w=wc
如果迭代器==l：
dictionary.update（{int（bin（j）[2:]）：wc+'$'}）
其他：
dictionary.update（{int（bin（j）[2:]）：wc}）
w=“”
j+=1
返回字典
def CHACKED_lookup（k，dictionary）：#返回缩短的源字符串的条目号
剪切源字符串=字典[k][：-1]
对于键，dictionary.iteritems（）中的值：
如果值==剪切源字符串：
返回键
def pointer_length_correct（lst）：#获取（指针，位）列表并更正指针的长度
新的\u指针\u位=[]
对于lst中的配对：
n=第一索引（对）
如果len（str（pair[0]）>cel（log（n+1,2））：
而len（str（pair[0]）=ceil（对数（n+1,2））：
配对=（str（配对[0]）[1:]，配对[1]）
如果len（str（pair[0]），您可以在格式化的代码块内编写代码，例如，使用反勾号这是代码
您的代码非常脏。顺便问一下，当len（str（pair[0]）==ceil（log（n+1,2））
时，pointer\u length\u correct会发生什么？请不要将任何内容分配给列表
。您测试过哪些输入值？该算法可以创建一个比输入字符串长的输出字符串；如果您关心的是您的代码是否工作，我建议您编写相应的解压缩函数，这样您就可以通过两种方式运行一段数据进行测试，并确保它按照输入的方式输出。@DYZ我知道！我几乎没有编程经验。在指针\u length\u correct
中，如果等式成立，则无需执行任何操作，因为指针已经具有正确的长度@Irisshpunk我已经用一系列二进制字符串测试过了，最多有10000位！我知道对于短输入字符串，它可以返回一个更大的字符串，但随着输入大小的增加，它应该开始非常快速地压缩。我还写了一个解码函数，它总是正确解码。