如何更换大海捞针的第n个外观？（Python）_Python_Regex_Replace

如何更换大海捞针的第n个外观？（Python）

python regex replace

如何更换大海捞针的第n个外观？（Python）,python,regex,replace,Python,Regex,Replace,我正试图取代大海捞针的第n次出现。我只想通过re.sub（）实现这一点，但似乎无法找到合适的正则表达式来解决这个问题。我正在努力适应：但我想我在跨越多行方面失败了我目前的方法是一种迭代方法，它从每个突变后的一开始就找到每个事件的位置。这是相当低效的，我想得到一些意见。谢谢你能用re.findall和MatchObject.start（）和MatchObject.end（）来完成吗使用.findall查找字符串中所有出现的模式，使用.start/.end获取第n次出现的索引，使用索引创建具有

我正试图取代大海捞针的第n次出现。我只想通过re.sub（）实现这一点，但似乎无法找到合适的正则表达式来解决这个问题。我正在努力适应：但我想我在跨越多行方面失败了

我目前的方法是一种迭代方法，它从每个突变后的一开始就找到每个事件的位置。这是相当低效的，我想得到一些意见。谢谢

你能用re.findall和MatchObject.start（）和MatchObject.end（）来完成吗

使用.findall查找字符串中所有出现的模式，使用.start/.end获取第n次出现的索引，使用索引创建具有替换值的新字符串？

我想你的意思是

re.sub

。您可以传递函数并跟踪到目前为止调用函数的频率：

def replaceNthWith(n, replacement):
    def replace(match, c=[0]):
        c[0] += 1
        return replacement if c[0] == n else match.group(0)
    return replace

用法：

re.sub(pattern, replaceNthWith(n, replacement), str)

但这种方法感觉有点老套，也许还有更优雅的方法

类似于此正则表达式的内容应该会对您有所帮助。虽然我不确定它的效率有多高：

#N=3   
re.sub(
  r'^((?:.*?mytexttoreplace){2}.*?)mytexttoreplace',
  '\1yourreplacementtext.', 
  'mystring',
  flags=re.DOTALL
)

DOTALL标志很重要。

我已经为此奋斗了一段时间，但我找到了一个我认为非常适合的解决方案：

>>> def nth_matcher(n, replacement):
...     def alternate(n):
...         i=0
...         while True:
...             i += 1
...             yield i%n == 0
...     gen = alternate(n)
...     def match(m):
...         replace = gen.next()
...         if replace:
...             return replacement
...         else:
...             return m.group(0)
...     return match
...     
... 
>>> re.sub("([0-9])", nth_matcher(3, "X"), "1234567890")
'12X45X78X0'

编辑：匹配器由两部分组成：

alternate（n）

功能。这将返回一个值，该值返回一个无限序列True/False，其中每个第n个值都为True。把它想象成

列表（备选方案（3））==[假，假，真，假，假，真，假，…]

匹配（m）

功能。这是传递给

re.sub

的函数：它获取

alternate（n）

（

gen.next（）

）中的下一个值，如果它是

True

，它将替换匹配的值；否则，它将保持不变（用自身替换）

我希望这足够清楚。如果我的解释不清楚，请这样说，我会改进它。

如果模式（“针”）或替换是一个复杂的正则表达式，您不能假设任何东西。函数“nth_occurrence_sub”是我提出的一个更通用的解决方案：

def nth_match_end(pattern, string, n, flags):
    for i, match_object in enumerate(re.finditer(pattern, string, flags)):
        if i + 1 == n:
            return match_object.end()


def nth_occurrence_sub(pattern, repl, string, n=0, flags=0):
    max_n = len(re.findall(pattern, string, flags))
    if abs(n) > max_n or n == 0:
        return string
    if n < 0:
        n = max_n + n + 1
    sub_n_times = re.sub(pattern, repl, string, n, flags)
    if n == 1:
        return sub_n_times
    nm1_end = nth_match_end(pattern, string, n - 1, flags)
    sub_nm1_times = re.sub(pattern, repl, string, n - 1, flags)
    sub_nm1_change = sub_nm1_times[:-1 * len(string[nm1_end:])]
    components = [
        string[:nm1_end],
        sub_n_times[len(sub_nm1_change):]
        ]
    return ''.join(components)

def n\u match\u end（模式、字符串、n、标志）：
对于i，匹配枚举中的_对象（关于finditer（模式、字符串、标志））：
如果i+1==n：
返回match_object.end（）
定义第n次出现（模式、应答、字符串、n=0、标志=0）：
max_n=len（关于findall（模式、字符串、标志））
如果abs（n）>max_n或n==0：
返回字符串
如果n<0：
n=最大值n+n+1
sub_n_times=re.sub（模式、repl、字符串、n、标志）
如果n==1：
返回sub_n_次
nm1\u end=n\u匹配\u end（模式、字符串、n-1、标志）
sub_nm1_times=re.sub（模式、repl、字符串、n-1、标志）
sub_nm1_change=sub_nm1_次[：-1*len（字符串[nm1_end:]）]
组件=[
字符串[：nm1_end]，
sub_n_次[len（sub_nm1_变化）：]
]
返回“”。加入（组件）

我编写了一个类似的函数来实现这一点。我试图复制SQL

REGEXP\u REPLACE（）

功能。我的结局是：

def sql_regexp_replace( txt, pattern, replacement='', position=1, occurrence=0, regexp_modifier='c'):
    class ReplWrapper(object):
        def __init__(self, replacement, occurrence):
            self.count = 0
            self.replacement = replacement
            self.occurrence = occurrence
        def repl(self, match):
            self.count += 1
            if self.occurrence == 0 or self.occurrence == self.count:
                return match.expand(self.replacement)
            else: 
                try:
                    return match.group(0)
                except IndexError:
                    return match.group(0)
    occurrence = 0 if occurrence < 0 else occurrence
    flags = regexp_flags(regexp_modifier)
    rx = re.compile(pattern, flags)
    replw = ReplWrapper(replacement, occurrence)
    return txt[0:position-1] + rx.sub(replw.repl, txt[position-1:])

defsql\u regexp\u replace（txt，pattern，replacement=''，position=1，occurrence=0，regexp\u modifier=c'）：
类ReplWrapper（对象）：
定义初始（自身、替换、发生）：
self.count=0
自我替换=替换
自我发生
def repl（自我，匹配）：
self.count+=1
如果self.occurrence==0或self.occurrence==self.count：
返回match.expand（self.replacement）
其他：
尝试：
返回匹配。组（0）
除索引器外：
返回匹配。组（0）
如果出现次数<0，则出现次数=0，否则出现次数
flags=regexp\u标志（regexp\u修饰符）
rx=重新编译（模式、标志）
replw=ReplWrapper（替换，出现）
返回txt[0:position-1]+rx.sub（replw.repl，txt[position-1:]

我没有提到的一个重要注意事项是，您需要返回

match.expand（）

，否则它将无法正确地展开

\1

模板，并将它们视为文本

如果您想让它工作，您需要以不同的方式处理标志（或者从中获取标志，它的实现很简单，您可以通过将其设置为

并忽略我对

regexp\u flags（）

的调用来为测试对其进行虚拟）

你确定，这是低效的吗？相关：我查看了re.sub，但它似乎没有办法替换第n次出现，仅替换所有或第一次出现的X眼。因此，我认为使用findall/start&end等明显的步骤（对我来说）更简单、更清晰，而不是让它按照我想要的方式工作。@Matt：你说得对，它没有内置的这种方式。通过一个函数，你可以得到想要的效果。但它可能并不高效，因为它实际上替换了每一个事件（主要是它本身）。这很有趣，尽管我不太确定它是如何工作的。我可以从结果中看出，在干草堆结束之前，它每隔三次就会替换一次。如果我能理解这是如何工作的细节，我可以在它成功地替换一个事件之后添加约束。可以补充一些解释吗？这看起来可能是最好的答案。我想结合这个答案。这很好地解释了发电机。多亏了你的解释和另一个问题，我对这是如何工作的有了坚实的理解。还有，这个关闭是否正在进行中？@b是的。

gen

变量就是诀窍：它在

match

内部使用，但在外部范围内初始化