如何使用Python中的模式获取相应的字符_Python

如何使用Python中的模式获取相应的字符

python

如何使用Python中的模式获取相应的字符,python,Python,我现在正在开发一个函数，它可以将“[a-c]”这样的括号模式转换为“a”、“b”和“c” 我的意思不是用Python进行模式匹配。我的意思是我可以使用“[a-c]”作为输入，并输出相应的“a”、“b”和“c”，这是python正则表达式中“[a-c]”的有效匹配字符。我想要匹配的字符我们只需考虑[AZ-ZO-9~（-]）作为括号中的有效字符。< /强> BR> 不再考虑像“*”或“+”或“？”这样的修饰符。然而，很难写出一个健壮的，因为我们有太多的情况需要考虑。所以，我想知道在Python

我现在正在开发一个函数，它可以将“[a-c]”这样的括号模式转换为“a”、“b”和“c”

我的意思不是用Python进行模式匹配。我的意思是我可以使用“[a-c]”作为输入，并输出相应的“a”、“b”和“c”，这是python正则表达式中“[a-c]”的有效匹配字符。我想要匹配的字符

<强>我们只需考虑[AZ-ZO-9~（-]）作为括号中的有效字符。< /强> BR> 不再考虑像“*”或“+”或“？”这样的修饰符。

然而，很难写出一个健壮的，因为我们有太多的情况需要考虑。所以，我想知道在Python中是否有一些工具可以做到这一点

注意：正如@swenzel所指出的，这个有一些bug。我已经写了一个函数来做这项工作。你可以在这里查看

我推荐@swenzel在第二个提案中的做法。

有关

re.findall

的更多信息，请查看这是一个可能适合您的简单解决方案：

import re
import string

def expand(pattern):
    """
    Returns a list of characters that can be matched by the given pattern.
    """
    pattern = pattern[1:-1] # ignore the leading '[' and trailing ']'
    result = []
    lower_range_re = re.compile('[a-z]-[a-z]')
    upper_range_re = re.compile('[A-Z]-[A-Z]')
    digit_range_re = re.compile('[0-9]-[0-9]')

    for match in lower_range_re.findall(pattern):
        result.extend(string.ascii_lowercase[string.ascii_lowercase.index(match[0]):string.ascii_lowercase.index(match[2]) + 1])
    for match in upper_range_re.findall(pattern):
        result.extend(string.ascii_uppercase[string.ascii_uppercase.index(match[0]):string.ascii_uppercase.index(match[2]) + 1])
    for match in digit_range_re.findall(pattern):
        result.extend(string.digits[string.digits.index(match[0]):string.digits.index(match[2]) + 1])
    return result

它应该适用于

[b-g]

，

[0-3]

，

[g-N]

，

[b-gG-N1-3]

等模式。它不适用于

[abc]

，

[0123]

等模式。

此解决方案不需要regex，因此可能是错误的，但可以：

pattern = '[a-c]'
excludes = '[-]' # Or use includes if that is easier
result = []
for char in pattern:
    if char not in excludes: # if char in includes:
        result.append(char)
        print char

或者看看这里：

这听起来像是家庭作业。。。但就这样吧。
据我所知，范围定义需要一个解析器。
好了：

def parseRange(rangeStr, i=0):
    # Recursion anchor, return empty set if we're out of bounds
    if i >= len(rangeStr):
        return set()

    # charSet will tell us later if we actually have a range here
    charSet = None

    # There can only be a range if we have more than 2 characters left in the
    # string and if the next character is a dash
    if i+2 < len(rangeStr) and rangeStr[i+1] == '-':

        # We might have a range. Valid ranges are between the following pairs of
        # characters
        pairs = [('a', 'z'), ('A', 'Z'), ('0', '9')]

        for lo, hi in pairs:
            # We now make use of the fact that characters are comparable.
            # Also the second character should come after the first, or be
            # the same which means e.g. 'a-a' -> 'a'
            if (lo <= rangeStr[i] <= hi) and \
               (rangeStr[i] <= rangeStr[i+2] <= hi):
                   # Retreive the set with all chars from the substring
                   charSet = parseRange(rangeStr, i+3)

                   # Extend the chars from the substring with the ones in this
                   # range.
                   # `range` needs integers, so we transform the chars to ints
                   # using ord and make use of the fact that their ASCII
                   # representation is ascending
                   charSet.update(chr(k) for k in
                           range(ord(rangeStr[i]), 1+ord(rangeStr[i+2])))
                   break

    # If charSet is not yet defined this means that at the current position
    # there is not a valid range definition. So we just get all chars for the
    # following subset and add the current char
    if charSet is None:
        charSet = parseRange(rangeStr, i+1)
        charSet.add(rangeStr[i])

    # Return the char set with all characters defined within rangeStr[i:]
    return charSet

你是说正则表达式@HiteshDharamdasani，不，我不是要做模式匹配。我的意思是我可以使用“[a-c]”作为输入，并输出相应的“a”、“b”和“c”，这是python正则表达式中“[a-c]”的有效匹配字符。@andy那么，为什么你的问题被标记为“模式匹配”？描述一下你想要达到的目标。您的功能应该能够扩展什么？只有

构造？@geckon，如果这让问题变得不清楚，我已经删除了标记。你将很快陷入多项式复杂性。因为你必须使用蛮力（智能或非智能），所以获得所有可能匹配项的唯一方法是枚举问题空间，这可能会很长。我认为这不符合OP的要求。这只是返回边界，不是吗？我已经用中的第一个函数编写了一个类似的边界。我还没想过你提议的第二个。它很健壮，但我认为我们将匹配（26+26+10+2）次的每个正则表达式的速度都会很慢。@andy检查64次的匹配远远不是很慢，特别是因为您只有一个字符要匹配。并且没有一个测试用例包含两个以上的范围定义。这里唯一的问题是你不能影响什么是有效的，什么是无效的。例如

'[3cN-\'

是完全有效的正则表达式，但您希望它抛出一个异常，而使用

re

则不会发生这种异常。您对

[3cN-\\\\

的看法是正确的。正如我所说的，我自己很难写出一个健壮的。谢谢你的建议，我想我会采纳你提出的第二个建议，那就是永远做好工作，保持活力。最后，正确优先于速度。：）

def parseRangeRe(rangeStr):
    master_pattern = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
    matcher = re.compile(rangeStr)
    return set(matcher.findall(master_pattern))