Python 使用正则表达式替换文本文件中的多个实体_Python_Regex_Python 3.x

Python 使用正则表达式替换文本文件中的多个实体

python regex python-3.x

Python 使用正则表达式替换文本文件中的多个实体,python,regex,python-3.x,Python,Regex,Python 3.x,我有一个包含多行记录的结构化文本文件。每个记录都应有一个键唯一字段。我需要通读一系列这些文件，找到非唯一的键值字段，并用唯一值替换键值我的脚本正在识别所有需要替换的字段。我将这些字段存储在字典中，其中键是非唯一字段，值是唯一值的列表例如：我想做的是只读取每个文件一次，查找“1111111111”（dict key）的实例，并用第一个键值替换第一个匹配，用第二个键值替换第二个匹配，等等我试图使用正则表达式，但我不知道如何在不多次循环文件的情况下构造合适的RE 这是我当前的代码： def m

我有一个包含多行记录的结构化文本文件。每个记录都应有一个键唯一字段。我需要通读一系列这些文件，找到非唯一的键值字段，并用唯一值替换键值

我的脚本正在识别所有需要替换的字段。我将这些字段存储在字典中，其中键是非唯一字段，值是唯一值的列表

例如：

我想做的是只读取每个文件一次，查找“1111111111”（dict key）的实例，并用第一个键值替换第一个匹配，用第二个键值替换第二个匹配，等等

我试图使用正则表达式，但我不知道如何在不多次循环文件的情况下构造合适的RE

这是我当前的代码：

def multireplace(Text, Vars):
    dictSorted = sorted(Vars, key=len, reverse=True)
    regEx = re.compile('|'.join(map(re.escape, dictSorted)))
    return regEx.sub(lambda match: Vars[match.group(0)], Text)

text = multireplace(text, find_replace_dict)

它适用于单键：值组合，但如果：值是列表，则无法编译：

return regEx.sub(lambda match: Vars[match.group(0)], Text , 1)
TypeError: sequence item 1: expected str instance, list found

不需要在一个文件中多次循环就可以更改函数？

查看并阅读注释。如果有任何不合理的地方，请告诉我：

import re

def replace(text, replacements):
    # Make a copy so we don't destroy the original.
    replacements = replacements.copy()

    # This is essentially what you had already.
    regex = re.compile("|".join(map(re.escape, replacements.keys())))

    # In our lambda, we pop the first element from the array. This way,
    # each time we're called with the same group, we'll get the next replacement.
    return regex.sub(lambda m: replacements[m.group(0)].pop(0), text)

print(replace("A A B B A B", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))

# Output:
# A1 A2 B1 B2 A3 B3

更新

要帮助解决以下注释中的问题，请尝试此版本，它将准确地告诉您哪些字符串已用完替换项：

import re

def replace(text, replacements):

    # Let's make a method so we can do a little more than the lambda.
    def make_replacement(match):
        try:
            return replacements[match.group(0)].pop(0)
        except IndexError:
            # Print out debug info about what happened
            print("Ran out of replacements for {}".format(match.group(0)))
            # Re-raise so the process still exits.
            raise

    # Make a copy so we don't destroy the original.
    replacements = replacements.copy()

    # This is essentially what you had already.
    regex = re.compile("|".join(map(re.escape, replacements.keys())))

    # In our lambda, we pop the first element from the array. This way,
    # each time we're called with the same group, we'll get the next replacement.
    return regex.sub(make_replacement, text)

print(replace("A A B B A B A", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))

# Output:
# A1 A2 B1 B2 A3 B3

这对我来说是失败的：return regex.sub（lambda m:replacements[m.group（0）].pop（0），text）indexer:pop from empty list我从一个打开（完整文件路径，'r'）的文件中获取文本，作为f:text=f.read（）。然后用dict将文本传递给replace函数。不过，您的代码似乎只适用于一行文本。如果您从空列表中获取

indexer:pop

，则该特定字符串的替换项似乎用完了。（所有这些字符串都已被使用。）请参阅我的编辑，以获取不同版本的代码，该代码将帮助您确定哪些字符串已被替换。（这也为您输入默认值或以其他方式处理错误提供了一个很好的位置。）谢谢。我发现了我的dict的问题。拿走我能给你的所有假互联网积分！

import re

def replace(text, replacements):

    # Let's make a method so we can do a little more than the lambda.
    def make_replacement(match):
        try:
            return replacements[match.group(0)].pop(0)
        except IndexError:
            # Print out debug info about what happened
            print("Ran out of replacements for {}".format(match.group(0)))
            # Re-raise so the process still exits.
            raise

    # Make a copy so we don't destroy the original.
    replacements = replacements.copy()

    # This is essentially what you had already.
    regex = re.compile("|".join(map(re.escape, replacements.keys())))

    # In our lambda, we pop the first element from the array. This way,
    # each time we're called with the same group, we'll get the next replacement.
    return regex.sub(make_replacement, text)

print(replace("A A B B A B A", {"A": ["A1", "A2", "A3"], "B": ["B1", "B2", "B3"]}))

# Output:
# A1 A2 B1 B2 A3 B3