Python 正则表达式：替换文本，除非它位于引号之间_Python_Regex

Python 正则表达式：替换文本，除非它位于引号之间

python regex

Python 正则表达式：替换文本，除非它位于引号之间,python,regex,Python,Regex,我正在开发一个transpiler，希望用Python的标记替换我的语言标记。替换是这样进行的： for rep in reps: pattern, translated = rep; # Replaces every [pattern] with [translated] in [transpiled] transpiled = re.sub(pattern, translated, transpiled, flags=re.UNICODE) 其中，reps是（要替

我正在开发一个transpiler，希望用Python的标记替换我的语言标记。替换是这样进行的：

for rep in reps:
    pattern, translated = rep;

    # Replaces every [pattern] with [translated] in [transpiled]
    transpiled = re.sub(pattern, translated, transpiled, flags=re.UNICODE)

其中，

reps

是

（要替换的正则表达式，要替换的字符串）

有序对的列表，

transpiled

是要传输的文本

然而，我似乎找不到一种方法从替换过程中排除引号之间的文本。请注意，这是针对一种语言的，因此它也适用于转义引号和单引号。

这可能取决于您如何定义模式，但一般来说，您可以始终使用前向和后向组包围您的

模式，以确保引号之间的文本不匹配：
import re

transpiled = "A foo with \"foo\" and single quoted 'foo'. It even has an escaped \\'foo\\'!"

reps = [("foo", "bar"), ("and", "or")]

print(transpiled)  # before the changes

for rep in reps:
    pattern, translated = rep
    transpiled = re.sub("(?<=[^\"']){}(?=\\\\?[^\"'])".format(pattern),
                        translated, transpiled, flags=re.UNICODE)
    print(transpiled)  # after each change

另外，这还允许您将完全限定的正则表达式模式定义为替换模式：
print(replace_non_quoted("My foo and \"bar\" are like 'moo' and star!",
                        (("(\w+)oo", "oo\\1"), ("(\w+)ar", "ra\\1"))))
# My oof and "bar" are like 'moo' and rast!

但是如果您的替换不涉及模式并且只需要简单的替换，那么您可以将replace\u multiple（）
helper函数中的re.sub（）
替换为速度显著更快的本机str.replace（）

最后，如果不需要复杂的模式，可以完全去掉正则表达式：
QUOTE_STRINGS = ("'", "\\'", '"', '\\"')  # a list of substring considered a 'quote'

def replace_multiple(source, replacements):  # a convenience multi-replacement function
    if not source:  # no need to process empty strings
        return ""
    for r in replacements:
        source = source.replace(r[0], r[1])
    return source

def replace_non_quoted(source, replacements):
    result = []  # a store for the result pieces
    head = 0  # a search head reference
    eos = len(source)  # a convenience string length reference
    quote = None  # last quote match literal
    quote_len = 0  # a convenience reference to the current quote substring length
    while True:
        if quote:  # we already have a matching quote stored
            index = source.find(quote, head + quote_len)  # find the closing quote
            if index == -1:  # EOS reached
                break
            result.append(source[head:index + quote_len])  # add the quoted string verbatim
            head = index + quote_len  # move the search head after the quoted match
            quote = None  # blank out the quote literal
        else:  # the current position is not in a quoted substring
            index = eos
            # find the first quoted substring from the current head position
            for entry in QUOTE_STRINGS:  # loop through all quote substrings
                candidate = source.find(entry, head)
                if head < candidate < index:
                    index = candidate
                    quote = entry
                    quote_len = len(entry)
            if not quote:  # EOS reached, no quote found
                break
            result.append(replace_multiple(source[head:index], replacements))
            head = index  # move the search head to the start of the quoted match
    if head < eos:  # if the search head is not at the end of the string
        result.append(replace_multiple(source[head:], replacements))
    return "".join(result)  # join back the result pieces and return them

QUOTE\u STRINGS=（“'”、“\\”、““'”、“\\”）\子字符串列表被视为“QUOTE”
def replace_multiple（源，replacements）：#一种方便的多重替换功能
如果不是源：#不需要处理空字符串
返回“”
对于替换中的r：
source=source.replace（r[0]，r[1]）
返回源
def replace_非报价（来源，替换）：
结果=[]#结果块的存储
head=0#搜索头引用
eos=len（源）#一个方便的字符串长度参考
quote=None#最后一个quote与文字匹配
quote_len=0#对当前quote子字符串长度的方便引用
尽管如此：
if quote:#我们已经存储了一个匹配的quote
index=source.find（引号，head+quote_len）#查找结束引号
如果索引==-1:#达到EOS
打破
result.append（source[head:index+quote_len]）#逐字添加带引号的字符串
head=index+quote#len#在引用的匹配之后移动搜索头
quote=None#清空quote文字
else:#当前位置不在带引号的子字符串中
指数=eos
#从当前头部位置查找第一个引用的子字符串
对于QUOTE#字符串中的条目：#循环所有QUOTE子字符串
候选人=来源。查找（条目，标题）
如果头部<候选者<索引：
索引=候选人
quote=条目
quote_len=len（条目）
如果没有报价：#已达到EOS，未找到报价
打破
结果.append（replace_multiple（源[头：索引]，replaces））
head=index#将搜索头移动到引用匹配的开始处
如果头
您可能希望使用Python的内置模块，而不仅仅是使用正则表达式。它是为处理引用字符串而设计的，就像在shell中一样，包括嵌套的示例
import shlex
shlex.split("""look "nested \\"quotes\\"" here""")
# ['look', 'nested "quotes"', 'here']

我想我知道你的意思，但为了确定，你能提供一个当前和预期输出的示例输入吗？这通常会使有人更容易回答这个问题。为什么不在您的正则表达式模式中加入由[“]”]
“]组成的前瞻/后顾组？^这可能是您想要的^此处的一些信息：谢谢，但似乎不起作用……例如：\n import re\n\n transpiled=（'hey“foo and foo！”）\n\n reps=[（（“foo！”）“，“bar”），（“and”，“or”）]\n\n print（transpiled）#在reps中的rep更改之前：\n pattern，translated=rep\n transpiled=re.sub（（？再次感谢您花费大量时间和精力。但是，不幸的是，此解决方案似乎仍然不起作用…（稍作修改）procudes的代码TypeError:sequence item 0:expected string，NoneType found
。我已经尝试了很长一段时间，所以您的帮助真的很重要。提前感谢！@Lucca-您从未从replace\u multiple（）返回源代码
函数。另外，将引号_STRING保留在函数之外，这样您就不必每次运行模式时都重新编译它。是的！非常感谢！
print(replace_non_quoted("My foo and \"bar\" are like 'moo' and star!",
                        (("(\w+)oo", "oo\\1"), ("(\w+)ar", "ra\\1"))))
# My oof and "bar" are like 'moo' and rast!

QUOTE_STRINGS = ("'", "\\'", '"', '\\"')  # a list of substring considered a 'quote'

def replace_multiple(source, replacements):  # a convenience multi-replacement function
    if not source:  # no need to process empty strings
        return ""
    for r in replacements:
        source = source.replace(r[0], r[1])
    return source

def replace_non_quoted(source, replacements):
    result = []  # a store for the result pieces
    head = 0  # a search head reference
    eos = len(source)  # a convenience string length reference
    quote = None  # last quote match literal
    quote_len = 0  # a convenience reference to the current quote substring length
    while True:
        if quote:  # we already have a matching quote stored
            index = source.find(quote, head + quote_len)  # find the closing quote
            if index == -1:  # EOS reached
                break
            result.append(source[head:index + quote_len])  # add the quoted string verbatim
            head = index + quote_len  # move the search head after the quoted match
            quote = None  # blank out the quote literal
        else:  # the current position is not in a quoted substring
            index = eos
            # find the first quoted substring from the current head position
            for entry in QUOTE_STRINGS:  # loop through all quote substrings
                candidate = source.find(entry, head)
                if head < candidate < index:
                    index = candidate
                    quote = entry
                    quote_len = len(entry)
            if not quote:  # EOS reached, no quote found
                break
            result.append(replace_multiple(source[head:index], replacements))
            head = index  # move the search head to the start of the quoted match
    if head < eos:  # if the search head is not at the end of the string
        result.append(replace_multiple(source[head:], replacements))
    return "".join(result)  # join back the result pieces and return them

import shlex
shlex.split("""look "nested \\"quotes\\"" here""")
# ['look', 'nested "quotes"', 'here']