Python 使用pyparsing解析复杂筛选器定义_Python_Pyparsing

Python 使用pyparsing解析复杂筛选器定义

python

Python 使用pyparsing解析复杂筛选器定义,python,pyparsing,Python,Pyparsing,我试图解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示： attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3) import pyparsing def process_results(result): for key in result.keys(): print(key + ":" + str(result[key])) if key == 'com

我试图解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示：

attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)

import pyparsing

def process_results(result):
    for key in result.keys():
        print(key + ":" + str(result[key]))
        if key == 'complex_filter':
            process_results(result[key])


def parse_filter(filter_string):
    # break these up so we can represent higher precedence for 'and' over 'or'
    not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
    and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
    or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")

    # db_keyword is okay, but you might just want to use a general 'identifier' expression,
    # you won't have to keep updating as you add other terms to your query language
    ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')

    # comparison operators
    comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])

    # instead of generic 'value', define specific value types
    integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
    float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

    # use pyparsing's QuotedString class for this, it gives you quote escaping, and
    # automatically strips quotes from the parsed text
    quote = pyparsing.QuotedString('"')

    # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
    literal_true = pyparsing.Keyword('true', caseless=True)
    literal_false = pyparsing.Keyword('false', caseless=True)
    boolean_literal = literal_true | literal_false

    # in future, you can expand comparison_operand to be its own operatorPrecedence
    # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
    comparison_operand = quote | ident | float_ | integer
    comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )


    grammar = pyparsing.infixNotation(comparison_expr,
        [
        (not_operator, 1, pyparsing.opAssoc.RIGHT),
        (and_operator, 2, pyparsing.opAssoc.LEFT),
        (or_operator,  2, pyparsing.opAssoc.LEFT),
        ]
    ).setResultsName("complex_filter")

    res = grammar.parseString(filter_string, parseAll=True)

    return res

res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')

process_results(res)

当然，使用更多嵌套级别和逻辑运算符进行过滤可能会更加复杂。归根结底是：

提取“单元筛选表达式”，例如

attribute1==value1

运行数据集中的每个筛选器

使用交点（和）和并集（或）组合结果

我大量重用了Paul McGuire提供的一些示例，因此我的代码如下所示：

attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)

import pyparsing

def process_results(result):
    for key in result.keys():
        print(key + ":" + str(result[key]))
        if key == 'complex_filter':
            process_results(result[key])


def parse_filter(filter_string):
    # break these up so we can represent higher precedence for 'and' over 'or'
    not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
    and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
    or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")

    # db_keyword is okay, but you might just want to use a general 'identifier' expression,
    # you won't have to keep updating as you add other terms to your query language
    ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')

    # comparison operators
    comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])

    # instead of generic 'value', define specific value types
    integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
    float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

    # use pyparsing's QuotedString class for this, it gives you quote escaping, and
    # automatically strips quotes from the parsed text
    quote = pyparsing.QuotedString('"')

    # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
    literal_true = pyparsing.Keyword('true', caseless=True)
    literal_false = pyparsing.Keyword('false', caseless=True)
    boolean_literal = literal_true | literal_false

    # in future, you can expand comparison_operand to be its own operatorPrecedence
    # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
    comparison_operand = quote | ident | float_ | integer
    comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )


    grammar = pyparsing.infixNotation(comparison_expr,
        [
        (not_operator, 1, pyparsing.opAssoc.RIGHT),
        (and_operator, 2, pyparsing.opAssoc.LEFT),
        (or_operator,  2, pyparsing.opAssoc.LEFT),
        ]
    ).setResultsName("complex_filter")

    res = grammar.parseString(filter_string, parseAll=True)

    return res

res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')

process_results(res)

正如您所看到的，它不会一直在“嵌套”结果中循环。。。我希望输出是

complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
unit_filter:[['attribute1', '==', 'value1']]
operator:and
complex_filter: [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]
unit_filter:[['attribute2', '>=', 3]]
operator:or
unit_filter:[['attribute3', '!=', 'value3']]

你知道我能做什么才能到达那里吗？谢谢

不要将求值策略建立在结果名称的基础上，而应尝试使用对运算符优先级的每个级别进行建模的类。您可以在pyparsing wiki示例页面上的SimpleBool.py解析器中看到这方面的示例。在类上使用适当的repr或str方法，您应该能够获得所需的输出。