将两个字符更改为一个符号（Python）_Python_Python 3.x_File_Compression

将两个字符更改为一个符号（Python）

python python-3.x file compression

将两个字符更改为一个符号（Python）,python,python-3.x,file,compression,Python,Python 3.x,File,Compression,我目前正在为学校做一个文件压缩任务，我发现自己无法理解这段代码中发生了什么（更具体地说，什么没有发生，为什么没有发生）因此，在代码的这一部分中，我的目标是，在非编码术语中，将两个相邻的相同字母更改为一个符号，因此占用更少的内存： for i, word in enumerate(file_contents): #file_contents = LIST of words in any given text file

我目前正在为学校做一个文件压缩任务，我发现自己无法理解这段代码中发生了什么（更具体地说，什么没有发生，为什么没有发生）

因此，在代码的这一部分中，我的目标是，在非编码术语中，将两个相邻的相同字母更改为一个符号，因此占用更少的内存：

          for i, word in enumerate(file_contents): 
           #file_contents = LIST of words in any given text file       

                word_contents = (file_contents[i]).split()
                for ind,letter in enumerate(word_contents[:-1]):
                    if word_contents[ind] == word_contents[ind+1]:
                         word_contents[ind] = ''
                         word_contents[ind+1] = '★'

然而，当我用一个示例文本文件运行完整代码时，它似乎没有按照我告诉它的那样做。例如，Sally这个词应该是Sa★但却保持不变。有人能帮我走上正轨吗

编辑：我错过了一个非常关键的细节。我希望压缩字符串以某种方式出现在原始文件内容列表中，其中有两个字母，因为完全压缩算法的目的是返回输入文件中文本的压缩版本

我建议使用

regex

匹配相同的相邻字符

示例：

import re

txt = 'sally and bobby'
print(re.sub(r"(.)\1", '*', txt))

# sa*y and bo*y

代码中的循环和条件检查不是必需的。请使用下面的行：

word_contents = re.sub(r"(.)\1", '*', word_contents)

你的代码有一些地方出了问题（我想）

1） split生成的是一个列表而不是str，所以当你说这个枚举（word_contents[：-1]）时，看起来你是在假设得到一个字符串？！？无论如何。。。我不确定是不是

但是后来

2）这一行：

if word_contents[ind] == word_contents[ind+1]:
                   word_contents[ind] = ''
                   word_contents[ind+1] = '★'

你又在做手术了。很明显，您希望对正在处理的字中的字符串或字符列表进行操作。在最好的情况下，这个函数将什么也不做，在最坏的情况下，您正在破坏word内容列表

因此，当您执行修改时，您是在修改word_内容列表，而不是您实际查看的列表项[：-1]。还有更多的问题，但我认为这回答了你的问题（我希望）

如果你真的想了解你做错了什么，我建议你在做什么的同时，用印刷的语句。如果你在找人帮你做家庭作业，我想还有另一个人已经给了你答案

下面是一个如何向函数添加日志的示例

  for i, word in enumerate(file_contents): 
   #file_contents = LIST of words in any given text file       

        word_contents = (file_contents[i]).split()
        # See what the word content list actually is
        print(word_contents)
        # See what your slice is actually returning
        print(word_contents[:-1])
        # Unless you have something modifying your list elsewhere you probably want to iterate over the words list generally and not just the slice of it as well.
        for ind,letter in enumerate(word_contents[:-1]):
            # See what your other test is testing
            print(word_contents[ind], word_contents[ind+1])
            # Here you probably actually want
            # word_contents[:-1][ind]
            # which is the list item you iterate over and then the actual string I suspect you get back
            if word_contents[ind] == word_contents[ind+1]:
                 word_contents[ind] = ''
                 word_contents[ind+1] = '★'

更新：根据OP的后续问题，我制作了一个带有说明的示例程序。请注意，这不是一个最佳解决方案，但主要是在教学流控制和使用基本结构方面的练习

# define the initial data...
file = "sally was a quick brown fox and jumped over the lazy dog which we'll call billy"
file_contents = file.split()

# Enumerate isn't needed in your example unless you intend to use the index later (example below)
for list_index, word in enumerate(file_contents):

# changing something you iterate over is dangerous and sometimes confusing like in your case you iterated over 
# word contents and then modified it.  if you have to take
# two characters you change the index and size of the structure making changes potentially invalid. So we'll create a new data structure to dump the results in
    compressed_word = []

    # since we have a list of strings we'll just iterate over each string (or word) individually
    for character in word:
        # Check to see if there is any data in the intermediate structure yet if not there are no duplicate chars yet
        if compressed_word:
            # if there are chars in new structure, test to see if we hit same character twice 
            if character == compressed_word[-1]:
                # looks like we did, replace it with your star
                compressed_word[-1] = "*"
                # continue skips the rest of this iteration the loop
                continue
        # if we haven't seen the character before or it is the first character just add it to the list
        compressed_word.append(character)

    # I guess this is one reason why you may want enumerate, to update the list with the new item?
    # join() is just converting the list back to a string
    file_contents[list_index] = "".join(compressed_word)

# prints the new version of the original "file" string
print(" ".join(file_contents))

输出：

“sa*y是一只敏捷的棕色狐狸，跳过了我们的懒狗，'*ca*bi*y”

我对我做错了什么有一个模糊的理解，但如果你能再帮我一点，给我一个如何让原始列表文件内容包含word内容中的符号的示例，我将非常感激，我真的很感激，因为我还是一个编程初学者，我已经尝试过在上面的代码后面直接声明

file\u contents[ind]=word\u contents[ind]

（在注释了print语句之后），但显然这不是正确的方法。我想我知道该怎么做了，你能帮我解决一下语法问题吗？当然我不介意多帮点忙，我的问题是我不确定文件内容到底是什么。我假设你已经有一个单词列表？是的，文件内容是一个输入文件中的单词列表