Python 用符号序列替换字符串中的关键字_Python_String_Class_Replace

Python 用符号序列替换字符串中的关键字

python string class replace

Python 用符号序列替换字符串中的关键字,python,string,class,replace,Python,String,Class,Replace,作为练习，我必须创建一个简单的亵渎过滤器来了解类过滤器使用一系列令人反感的关键字和替换模板进行初始化。这些单词的每一次出现都应替换为模板生成的字符串。如果单词大小比模板短，则应使用从头到尾的子字符串。如果单词大小较长，则应根据需要重复模板下面是我迄今为止的结果，并举了一个例子 class ProfanityFilter: def __init__(self, keywords, template): self.__keywords = sorted(keywords

作为练习，我必须创建一个简单的亵渎过滤器来了解类

过滤器使用一系列令人反感的关键字和替换模板进行初始化。这些单词的每一次出现都应替换为模板生成的字符串。如果单词大小比模板短，则应使用从头到尾的子字符串。如果单词大小较长，则应根据需要重复模板

下面是我迄今为止的结果，并举了一个例子

class ProfanityFilter:

    def __init__(self, keywords, template):
        self.__keywords = sorted(keywords, key=len, reverse=True)
        self.__template = template

    def filter(self, msg):

        def __replace_letters__(old_word, replace_str):
            replaced_word = ""
            old_index = 0
            replace_index = 0
            while old_index <= len(old_word):
                if replace_index == len(replace_str):
                    replace_index = 0
                else:
                    replaced_word += replace_str[replace_index]
                    replace_index += 1

                old_index += 1

            return replaced_word

        for keyword in self.__keywords:
            idx = 0
            while idx < len(msg):
                index_l = msg.lower().find(keyword.lower(), idx)
                if index_l == -1:
                    break
                msg = msg[:index_l] + __replace_letters__(keyword, self.__template) + msg[index_l + len(keyword):]
                idx = index_l + len(keyword)

        return msg


f = ProfanityFilter(["duck", "shot", "batch", "mastard"], "?#$")
offensive_msg = "this mastard shot my duck"
clean_msg = f.filter(offensive_msg)
print(clean_msg)  # should be: "this ?#$?#$? ?#$? my ?#$?"

但它显示：

this ?#$?#$ ?#$? my ?#$?

出于某种原因，它将“mastard”一词替换为6个符号，而不是7个（每个字母一个）。它适用于其他关键字，为什么不适用于这个

另外，如果你看到任何其他不对劲的事情，请随时告诉我。请记住，我是初学者，我的“工具箱”非常小。

我会用正则表达式来代替，因为

re.sub（）

有一个方便的API用于动态替换：

重新导入
类别亵渎过滤器：
定义初始化（自我、关键字、模板）：
#构建一个正则表达式，它将匹配所有亵渎的单词
self.keyword_re=re.compile（“|”）
self.template=模板
def_生成_替换（自身、word）：
l=len（字）
#计算重复模板的次数
重复=（l//len（self.template））+1
#因为我们可能会得到一个比原来长的字符串，
#切成正确的长度。
返回（self.template*重复）[：l]
def过滤器（自身，msg）：
#将正则表达式的所有匹配项替换为
#动态计算的替换值。
返回self.keyword\u re.sub(
lambda m:self.\u生成\u替换（m.group（0）），
味精，
)
f=亵渎过滤器（[“鸭子”、“射击”、“批次”、“马斯塔德”]、“？#$”）
进攻性的\u msg=“这只母鸭射中了我的鸭子”
打印（f.过滤器（攻击性信息））

您的问题在于索引逻辑。你有两个错误

每次到达替换字符串的末尾时，都会跳过亵渎中的一个字母：

    while old_index <= len(old_word):
        if replace_index == len(replace_str):
            replace_index = 0
            # You don't replace a letter; you just reset the new index, but ...
        else:
            replaced_word += replace_str[replace_index]
            replace_index += 1

        old_index += 1     # ... but you still advance the old index.

输出：

?#$?#$?#$?# on ?#$ rulez!

输入字为13和2个字母；替换的是11和3

修复这两个错误：使

old_index

保持在范围内，并仅在进行替换时增加它

        while old_index < len(old_word):
            if replace_index == len(replace_str):
                replace_index = 0
            else:
                replaced_word += replace_str[replace_index]
                replace_index += 1
                old_index += 1

旧索引



未来的改进：

将其重构为for
循环
不要重置替换索引；事实上，摆脱它。只需使用旧索引%len（替换str）
无法生成一行程序，但这里有一个糟糕的实现。不要做VoNWooDSoN做的事：
编辑
所以，我猜3.8有赋值表达式。。。所以，但这将是一个班轮那么（可能）
?#$?#$?#$?# on ?#$ rulez!

        while old_index < len(old_word):
            if replace_index == len(replace_str):
                replace_index = 0
            else:
                replaced_word += replace_str[replace_index]
                replace_index += 1
                old_index += 1

def replace(msg, keywords=["duck", "shot", "batch", "mastard"], template="?#$"):                                                                                                                                        
    for keyword in keywords * len(msg)):                                                                                                                                                   
        msg = (template*len(keyword))[:len(keyword)].join([msg[:msg.find(keyword)], msg[msg.find(keyword)+len(keyword):]]) if msg.find(keyword) > 0 else msg                                                            
    return msg                                                                                                                                                                                                          

offensive_msg = "this mastard shot my duck"                                                                                                                                                                             
clean_msg = replace(offensive_msg)                                                                                                                                                                                      

print(clean_msg)  # should be: "this ?#$?#$? ?#$? my ?#$?"                                                                                                                                                              
print(clean_msg=="this ?#$?#$? ?#$? my ?#$?")

print ((lambda msg: [msg := (("?#$"*len(keyword))[:len(keyword)].join([msg[:msg.find(keyword)], msg[msg.find(keyword)+len(keyword):]]) if msg.find(keyword) > 0 else msg) for keyword in ["duck", "shot", "batch", "mastard"]])("this mastard shot my duck")[-1])