Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用pyparsing解析复杂筛选器定义_Python_Pyparsing - Fatal编程技术网

Python 使用pyparsing解析复杂筛选器定义

Python 使用pyparsing解析复杂筛选器定义,python,pyparsing,Python,Pyparsing,我试图解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示: attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3) import pyparsing def process_results(result): for key in result.keys(): print(key + ":" + str(result[key])) if key == 'com

我试图解析将应用于一组数据的复杂过滤器定义。典型的过滤器可能如下所示:

attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)
import pyparsing

def process_results(result):
    for key in result.keys():
        print(key + ":" + str(result[key]))
        if key == 'complex_filter':
            process_results(result[key])


def parse_filter(filter_string):
    # break these up so we can represent higher precedence for 'and' over 'or'
    not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
    and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
    or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")

    # db_keyword is okay, but you might just want to use a general 'identifier' expression,
    # you won't have to keep updating as you add other terms to your query language
    ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')

    # comparison operators
    comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])

    # instead of generic 'value', define specific value types
    integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
    float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))

    # use pyparsing's QuotedString class for this, it gives you quote escaping, and
    # automatically strips quotes from the parsed text
    quote = pyparsing.QuotedString('"')

    # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
    literal_true = pyparsing.Keyword('true', caseless=True)
    literal_false = pyparsing.Keyword('false', caseless=True)
    boolean_literal = literal_true | literal_false

    # in future, you can expand comparison_operand to be its own operatorPrecedence
    # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
    comparison_operand = quote | ident | float_ | integer
    comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )


    grammar = pyparsing.infixNotation(comparison_expr,
        [
        (not_operator, 1, pyparsing.opAssoc.RIGHT),
        (and_operator, 2, pyparsing.opAssoc.LEFT),
        (or_operator,  2, pyparsing.opAssoc.LEFT),
        ]
    ).setResultsName("complex_filter")

    res = grammar.parseString(filter_string, parseAll=True)

    return res

res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')

process_results(res)
当然,使用更多嵌套级别和逻辑运算符进行过滤可能会更加复杂。归根结底是:

  • 提取“单元筛选表达式”,例如
    attribute1==value1
  • 运行数据集中的每个筛选器
  • 使用交点(和)和并集(或)组合结果
  • 我大量重用了Paul McGuire提供的一些示例,因此我的代码如下所示:

    attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)
    
    import pyparsing
    
    def process_results(result):
        for key in result.keys():
            print(key + ":" + str(result[key]))
            if key == 'complex_filter':
                process_results(result[key])
    
    
    def parse_filter(filter_string):
        # break these up so we can represent higher precedence for 'and' over 'or'
        not_operator        = pyparsing.oneOf(['not','^'], caseless=True).setResultsName("operator")
        and_operator        = pyparsing.oneOf(['and','&'], caseless=True).setResultsName("operator")
        or_operator         = pyparsing.oneOf(['or' ,'|'], caseless=True).setResultsName("operator")
    
        # db_keyword is okay, but you might just want to use a general 'identifier' expression,
        # you won't have to keep updating as you add other terms to your query language
        ident = pyparsing.Word(pyparsing.alphas+'_'+'-', pyparsing.alphanums+'_'+'-')
    
        # comparison operators
        comparison_operator = pyparsing.oneOf(['==','!=','>','>=','<', '<='])
    
        # instead of generic 'value', define specific value types
        integer = pyparsing.Regex(r'[+-]?\d+').setParseAction(lambda t:int(t[0]))
        float_ = pyparsing.Regex(r'[+-]?\d+\.\d*').setParseAction(lambda t:float(t[0]))
    
        # use pyparsing's QuotedString class for this, it gives you quote escaping, and
        # automatically strips quotes from the parsed text
        quote = pyparsing.QuotedString('"')
    
        # when you are doing boolean expressions, it's always handy to add TRUE and FALSE literals
        literal_true = pyparsing.Keyword('true', caseless=True)
        literal_false = pyparsing.Keyword('false', caseless=True)
        boolean_literal = literal_true | literal_false
    
        # in future, you can expand comparison_operand to be its own operatorPrecedence
        # term, so that you can do things like "nucleon != 1+2" - but this is fine for now
        comparison_operand = quote | ident | float_ | integer
        comparison_expr = pyparsing.Group((quote | ident) + comparison_operator + comparison_operand).setResultsName("unit_filter", listAllMatches=True )
    
    
        grammar = pyparsing.infixNotation(comparison_expr,
            [
            (not_operator, 1, pyparsing.opAssoc.RIGHT),
            (and_operator, 2, pyparsing.opAssoc.LEFT),
            (or_operator,  2, pyparsing.opAssoc.LEFT),
            ]
        ).setResultsName("complex_filter")
    
        res = grammar.parseString(filter_string, parseAll=True)
    
        return res
    
    res = parse_filter('attribute1 == value1 and (attribute2 >= 3 or attribute3 != value3)')
    
    process_results(res)
    
    正如您所看到的,它不会一直在“嵌套”结果中循环。。。我希望输出是

    complex_filter:[['attribute1', '==', 'value1'], 'and', [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]]
    unit_filter:[['attribute1', '==', 'value1']]
    operator:and
    complex_filter: [['attribute2', '>=', 3], 'or', ['attribute3', '!=', 'value3']]
    unit_filter:[['attribute2', '>=', 3]]
    operator:or
    unit_filter:[['attribute3', '!=', 'value3']]
    

    你知道我能做什么才能到达那里吗?谢谢

    不要将求值策略建立在结果名称的基础上,而应尝试使用对运算符优先级的每个级别进行建模的类。您可以在pyparsing wiki示例页面上的SimpleBool.py解析器中看到这方面的示例。在类上使用适当的repr或str方法,您应该能够获得所需的输出。