python正则表达式删除匹配的方括号文件_Python_Regex

python正则表达式删除匹配的方括号文件

python regex

python正则表达式删除匹配的方括号文件,python,regex,Python,Regex,我有一个Latex文件，其中很多文本都用\red{}标记，但是\red{}中也可能有括号，比如\red{here is\underline{underlined}text}。我想删除红色，在谷歌搜索之后，我编写了以下python脚本： import os, re, sys #Start program in terminal with #python RedRemover.py filename #sys.argv[1] then has the value filename ifn = sys

我有一个Latex文件，其中很多文本都用

\red{}

标记，但是

\red{}

中也可能有括号，比如

\red{here is\underline{underlined}text}

。我想删除红色，在谷歌搜索之后，我编写了以下python脚本：

import os, re, sys
#Start program in terminal with
#python RedRemover.py filename
#sys.argv[1] then has the value filename
ifn = sys.argv[1]
#Open file and read it
f = open(ifn, "r")
c = f.read() 
#The whole file content is now stored in the string c
#Remove occurences of \red{...} in c
c=re.sub(r'\\red\{(?:[^\}|]*\|)?([^\}|]*)\}', r'\1', c)
#Write c into new file
Nf=open("RedRemoved_"+ifn,"w")
Nf.write(c)

f.close()
Nf.close()

但这将改变

\红色{这里是\下划线{下划线}文本}

到

这里是\underline{下划线文本}

这不是我想要的。我想要

这是\下划线{下划线}文本

我认为你需要保持卷曲，考虑这种情况：<代码>红色{bf测试}< /> >：

import re

c = r'\red{here is \underline{underlined} text} and \red{more}'
d = c 

# this may be less painful and sufficient, and even more correct
c = re.sub(r'\\red\b', r'', c)
print "1ST:", c

# if you want to get rid of the curlies:
d = re.sub(r'\\red{([^{]*(?:{[^}]*}[^}]*)*)}', r'\1', d)
print "2ND:", d

给出：

您无法将未确定级别的嵌套方括号与re模块匹配，因为它不支持递归。要解决此问题，您可以使用：

其中，

（？1）

是对捕获组1的递归调用。

谢谢您的回答！但是，如果红色文本多次出现，例如在c=r'\red{hereis\underline{underlined}text}和\red{more}中，这将不起作用。我也很感激关于保留花括号的评论，但是在我的文件中，你提到的情况没有发生。非常感谢！必须执行一个pip安装正则表达式，然后它就像一个符咒一样工作。@user2609987:它确实不是默认安装的。@ridgerunner:谢谢。当上下文不模棱两可时（即当没有整数或两个整数被花括号包围时），不需要经常转义花括号。许多口味不认为封闭的卷曲括号是一个特殊的字符。捕获组2用于替换。回答不错+1（但我还是会避开那些花括号。^）

1ST: {here is \underline{underlined} text} and {more}
2ND: here is \underline{underlined} text and more

import regex

c = r'\red{here is \underline{underlined} text}'

c = regex.sub(r'\\red({((?>[^{}]+|(?1))*)})', r'\2', c)