Python 转换正则匹配组_Python_Regex

Python 转换正则匹配组

python regex

Python 转换正则匹配组,python,regex,Python,Regex,我需要根据pdf文件规范匹配名称对象。但是，名称可能包含十六进制数字（前面加#）以指定特殊字符。我想把这些匹配翻译成相应的字符。有没有一种不重新解析匹配字符串的聪明方法 import re Name = re.compile(r''' (/ # Literal "/" (?: # (?:\#[A-Fa-

我需要根据pdf文件规范匹配名称对象。但是，名称可能包含十六进制数字（前面加#）以指定特殊字符。我想把这些匹配翻译成相应的字符。有没有一种不重新解析匹配字符串的聪明方法

import re

Name = re.compile(r'''
    (/                                        # Literal "/"
        (?:                                   #
            (?:\#[A-Fa-f0-9]{2})              # Hex numbers
            |                                 # 
            [^\x00-\x20 \x23 \x2f \x7e-\xff]  # Other
        )+                                    #
    )                                         #
    ''', re.VERBOSE)

#  some examples

names = """
    The following are examples of valid literal names:

    Raw string                       Translation

    1.  /Adobe#20Green            -> "Adobe Green"
    2.  /PANTONE#205757#20CV      -> "PANTONE 5757 CV"
    3.  /paired#28#29parentheses  -> "paired( )parentheses"
    4.  /The_Key_of_F#23_Minor    -> "The_Key_of_F#_Minor"
    5.  /A#42                     -> "AB"
    6.  /Name1
    7.  /ASomewhatLongerName
    8.  /A;Name_With-Various***Characters?
    9.  /1.2
    10. /$$
    11. /@pattern
    12. /.notdef
    """

查看

re.sub

您可以将其与函数一起使用，以匹配十六进制“#[0-9A-F]{2}”数，并使用函数转换这些数

例如

将返回“/Adobe Green”

我将使用带有包装生成器的

finditer（）

：

import re
from functools import partial

def _hexrepl(match):
    return chr(int(match.group(1), 16))
unescape = partial(re.compile(r'#([0-9A-F]{2})').sub, _hexrepl)

def pdfnames(inputtext):
    for match in Name.finditer(inputtext):
        yield unescape(match.group(0))

演示：

我知道再没有比这更聪明的方法了；

re

引擎无法将替换和匹配结合起来。

是否要在输入字符串中替换此项？或者你想列出与未转义序列匹配的名称？你能使用

re.sub

？@MartijnPieters--

列出与未转义序列匹配的名称

@Jack--是的，我可以在匹配中使用

re.sub

，但我想知道是否有更“聪明”的方式…在两个不同的步骤中进行匹配和转换有其优点；-）@MartijnPieters那么他也应该有智慧在问题中提到这一点以及为什么它不起作用：）@Jack：我要求澄清；他想要什么还不清楚。@Martijn Pieters是的，也许root正在寻找一个“单次正则表达式”。对不起，关于混淆。。。实际上，我实现了一些类似于您和Martijn的方法，但我认为可能有更好的方法。无论如何+1.谢谢，那么

re.sub

就是：）

import re
from functools import partial

def _hexrepl(match):
    return chr(int(match.group(1), 16))
unescape = partial(re.compile(r'#([0-9A-F]{2})').sub, _hexrepl)

def pdfnames(inputtext):
    for match in Name.finditer(inputtext):
        yield unescape(match.group(0))

>>> for name in pdfnames(names):
...     print name
... 
/Adobe Green
/PANTONE 5757 CV
/paired()parentheses
/The_Key_of_F#_Minor
/AB
/Name1
/ASomewhatLongerName
/A;Name_With-Various***Characters?
/1.2
/$$
/@pattern
/.notdef