Python 使用正则表达式删除格式字符串_Python_Regex

Python 使用正则表达式删除格式字符串

python regex

Python 使用正则表达式删除格式字符串,python,regex,Python,Regex,我有一个字符串格式： (header1:content1(note1, note2),content2(note3),content3)-(header2:content)-(header3) 现在我想删除所有内容，我想要的预期输出是 (header1)-(header2)-(header3) 我该怎么做？我尝试了一些正则表达式，但输出不正确更新1: 标题，内容和注意可以包含除（和）以外的任何字符更新2: @adsmith解决了我原来的问题。现在我的字符串格式如下： normalcont

我有一个字符串格式：

(header1:content1(note1, note2),content2(note3),content3)-(header2:content)-(header3)

现在我想删除所有内容，我想要的预期输出是

(header1)-(header2)-(header3)

我该怎么做？我尝试了一些正则表达式，但输出不正确

更新1:

标题

，

内容

和

注意

可以包含除

（

和

）

以外的任何字符

更新2: @adsmith解决了我原来的问题。现在我的字符串格式如下：

normalcontent1-(header1:content1(note1, note2),content2(note3),content3)-(header2:content)-normalcontent2-(header3)

预期产出：

normalcontent1-(header1)-(header2)-normalcontent2-(header3)

def getheaders（文本）：
elements=re.split（“（？）以下是一个示例：
印刷品：
normalcontent1-(header1)-(header2)-normalcontent2-(header3)
normalcontent1-(header)-normalcontent2-normalcontent3-(header2)

如果您正确定义语法，解析器将是比正则表达式更健壮的解决方案。
您能给我们一个数据实际外观的快速示例吗？如果标题，内容，或注释可以包含（
）
：
或，
，它可能会干扰某人的解决方案。而形式上的问题：到目前为止你尝试了什么？具体来说，向我们展示你的代码。@adsmith:我更新了我的问题。如果你的标题、内容和注释可以包含任何字符，那么这就不可能通过regex进行可靠的格式化。不知道n个分隔符。@adsmith:我又更新了我的问题。很好！非常感谢！我又更新了我的问题。你能帮我吗？@Pacman如果你更改re.split（（？
import pyparsing as pp
import re

txt='''normalcontent1-(header1:content1(note1, note2),content2(note3),content3)-(header2:content)-normalcontent2-(header3)
normalcontent1-(header:content)-normalcontent2-normalcontent3-(header2:content2‌)'''

def DashSplit(txt):
    ''' Replicate the function of str.split(',') but do not split on nested expressions or in quoted strings'''
    com_lok=[]
    dash = pp.Suppress('-')
    # note the location of each dash outside an ignored expression:
    dash.setParseAction(lambda s, lok, toks: com_lok.append(lok))
    ident = pp.Word(pp.alphas+"_", pp.alphanums+"_")  # python, C type identifier
    exp=(pp.nestedExpr())                             # Ignore everthing inside nested '( )'

    atom = ident | exp 
    expr = pp.OneOrMore(atom) + pp.ZeroOrMore(dash  + atom )
    try:
        result=expr.parseString(txt)
    except pp.ParseException as e:
        print('nope', e)
        return [txt]
    else:    
        return [txt[st:end] for st,end in zip([0]+[e+1 for e in com_lok],com_lok+[len(txt)])]      

def headerGetter(txt):
    m=re.match(r'\((\w+)', txt)
    if m:
        return '('+re.match(r'\((\w+)', txt).group(1)+')' 
    else:
        return txt    

for line in txt.splitlines():    
    print('-'.join(headerGetter(e) for e in DashSplit(line))) 

normalcontent1-(header1)-(header2)-normalcontent2-(header3)
normalcontent1-(header)-normalcontent2-normalcontent3-(header2)