帮助我找到合适的ruby/python解析器生成器

帮助我找到合适的ruby/python解析器生成器,python,ruby,debugging,parser-generator,Python,Ruby,Debugging,Parser Generator,我使用的第一个解析器生成器是Parse::RecDescent,可用的指南/教程非常棒,但它最有用的功能是它的调试工具,特别是跟踪功能(通过将$RD_TRACE设置为1激活)。我正在寻找一个解析器生成器,可以帮助您调试它的规则 问题是,它必须用python或ruby编写,并具有详细模式/跟踪模式或非常有用的调试技术 有人知道这样的解析器生成器吗 编辑:当我说调试时,我不是指调试python或ruby。我指的是调试解析器生成器,查看它在每一步都在做什么,查看它读取的每个字符,以及它试图匹配的规则。

我使用的第一个解析器生成器是Parse::RecDescent,可用的指南/教程非常棒,但它最有用的功能是它的调试工具,特别是跟踪功能(通过将$RD_TRACE设置为1激活)。我正在寻找一个解析器生成器,可以帮助您调试它的规则

问题是,它必须用python或ruby编写,并具有详细模式/跟踪模式或非常有用的调试技术

有人知道这样的解析器生成器吗

编辑:当我说调试时,我不是指调试python或ruby。我指的是调试解析器生成器,查看它在每一步都在做什么,查看它读取的每个字符,以及它试图匹配的规则。希望你明白我的意思


赏金编辑:为了赢得赏金,请展示一个解析器生成器框架,并说明它的一些调试功能。我重复一遍,我对pdb不感兴趣,但对解析器的调试框架感兴趣。另外,请不要提及树梢。我对它不感兴趣。

Python wiki有许多用Python编写的语言解析器。

Python是一种非常容易调试的语言。只需导入pdb pdb.settrace()

然而,这些解析器生成器应该具有良好的调试功能

对赏金的回应

下面是正在运行的PLY调试

源代码

tokens = (
    'NAME','NUMBER',
    )

literals = ['=','+','-','*','/', '(',')']

# Tokens

t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'

def t_NUMBER(t):
    r'\d+'
    t.value = int(t.value)
    return t

t_ignore = " \t"

def t_newline(t):
    r'\n+'
    t.lexer.lineno += t.value.count("\n")

def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

# Build the lexer
import ply.lex as lex
lex.lex(debug=1)

# Parsing rules

precedence = (
    ('left','+','-'),
    ('left','*','/'),
    ('right','UMINUS'),
    )

# dictionary of names
names = { }

def p_statement_assign(p):
    'statement : NAME "=" expression'
    names[p[1]] = p[3]

def p_statement_expr(p):
    'statement : expression'
    print(p[1])

def p_expression_binop(p):
    '''expression : expression '+' expression
                  | expression '-' expression
                  | expression '*' expression
                  | expression '/' expression'''
    if p[2] == '+'  : p[0] = p[1] + p[3]
    elif p[2] == '-': p[0] = p[1] - p[3]
    elif p[2] == '*': p[0] = p[1] * p[3]
    elif p[2] == '/': p[0] = p[1] / p[3]

def p_expression_uminus(p):
    "expression : '-' expression %prec UMINUS"
    p[0] = -p[2]

def p_expression_group(p):
    "expression : '(' expression ')'"
    p[0] = p[2]

def p_expression_number(p):
    "expression : NUMBER"
    p[0] = p[1]

def p_expression_name(p):
    "expression : NAME"
    try:
        p[0] = names[p[1]]
    except LookupError:
        print("Undefined name '%s'" % p[1])
        p[0] = 0

def p_error(p):
    if p:
        print("Syntax error at '%s'" % p.value)
    else:
        print("Syntax error at EOF")

import ply.yacc as yacc
yacc.yacc()

import logging
logging.basicConfig(
    level=logging.INFO,
    filename="parselog.txt"
)

while 1:
    try:
        s = raw_input('calc > ')
    except EOFError:
        break
    if not s: continue
    yacc.parse(s, debug=1)
输出

lex: tokens   = ('NAME', 'NUMBER')
lex: literals = ['=', '+', '-', '*', '/', '(', ')']
lex: states   = {'INITIAL': 'inclusive'}
lex: Adding rule t_NUMBER -> '\d+' (state 'INITIAL')
lex: Adding rule t_newline -> '\n+' (state 'INITIAL')
lex: Adding rule t_NAME -> '[a-zA-Z_][a-zA-Z0-9_]*' (state 'INITIAL')
lex: ==== MASTER REGEXS FOLLOW ====
lex: state 'INITIAL' : regex[0] = '(?P<t_NUMBER>\d+)|(?P<t_newline>\n+)|(?P<t_NAME>[a-zA-Z
_][a-zA-Z0-9_]*)'
calc > 2+3
PLY: PARSE DEBUG START

State  : 0
Stack  : . LexToken(NUMBER,2,1,0)
Action : Shift and goto state 3

State  : 3
Stack  : NUMBER . LexToken(+,'+',1,1)
Action : Reduce rule [expression -> NUMBER] with [2] and goto state 9
Result : <int @ 0x1a1896c> (2)

State  : 6
Stack  : expression . LexToken(+,'+',1,1)
Action : Shift and goto state 12

State  : 12
Stack  : expression + . LexToken(NUMBER,3,1,2)
Action : Shift and goto state 3

State  : 3
Stack  : expression + NUMBER . $end
Action : Reduce rule [expression -> NUMBER] with [3] and goto state 9
Result : <int @ 0x1a18960> (3)

State  : 18
Stack  : expression + expression . $end
Action : Reduce rule [expression -> expression + expression] with [2,'+',3] and goto state
 3
Result : <int @ 0x1a18948> (5)

State  : 6
Stack  : expression . $end
Action : Reduce rule [statement -> expression] with [5] and goto state 2
5
Result : <NoneType @ 0x1e1ccef4> (None)

State  : 4
Stack  : statement . $end
Done   : Returning <NoneType @ 0x1e1ccef4> (None)
PLY: PARSE DEBUG END
calc >

我不知道它的调试特性,但我听说了PyParsing的好东西


我知道赏金已经被宣布,但这里有一个用pyparsing编写的等价解析器(另外还支持使用零个或多个逗号删除参数的函数调用):

给出此输出:

sin(pi/2)
[['sin', [['pi', '/', '2']]]]

y = mx+b
['y', '=', ['mx', '+', 'b']]

E = mc ** 2
['E', '=', ['mc', '**', '2']]

F = m*a
['F', '=', ['m', '*', 'a']]

x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]

1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
为了进行调试,我们添加以下代码:

# enable debugging for name and number expressions
name.setDebug()
number.setDebug()
现在我们重新分析第一个测试(显示输入字符串和一个简单的列标尺):

给出此输出:

sin(pi/2)
[['sin', [['pi', '/', '2']]]]

y = mx+b
['y', '=', ['mx', '+', 'b']]

E = mc ** 2
['E', '=', ['mc', '**', '2']]

F = m*a
['F', '=', ['m', '*', 'a']]

x = x0 + v*t +a*t*t/2
['x', '=', ['x0', '+', 'v', '*', 't', '+', 'a', '*', 't', '*', 't', '/', '2']]

1 - sqrt(sin(t)**2 + cos(t)**2)
[['1', '-', ['sqrt', [[['sin', ['t']], '**', '2', '+', ['cos', ['t']], '**', '2']]]]]
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Pyparsing还支持packrat解析,这是一种解析时间记忆(阅读更多关于packrat的信息)。以下是相同的解析序列,但启用了packrat:

same parse, but with packrat parsing enabled
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']

这是一个有趣的练习,有助于我了解其他解析器库的调试功能。

ANTLR在生成人类可读和可理解的代码方面具有优势, 因为它是(一个非常复杂和强大的)自顶向下的解析器, 因此,您可以使用常规调试器逐步完成它 看看它到底在做什么

这就是为什么它是我选择的解析器生成器

像PLY这样的自底向上解析器生成器有缺点 对于更大的语法来说,这几乎是不可能理解的 调试输出的真正含义以及
解析表就是这样。

我知道。这真的没什么帮助。@Geo:为什么要投反对票(假设是你)?卡特曼怎么会知道你已经看过名单了?我想问题很具体。我要求提供具有良好调试功能的解析器生成器。他发布了一个列表,没有什么特别的。这里的第三方没有看到这个列表,很高兴看到它被提及。也许作为一个评论会更好。这很好!谢谢如果一天内没有其他答案发布,你就赢了。PyParasting不再在wikispaces.com上托管。去嘿!那红宝石的呢DPyparsing不再托管在wikispaces.com上。去
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']
same parse, but with packrat parsing enabled
1234567890123
    sin(pi/2)
Match name at loc 4(1,5)
Matched name -> ['sin']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 8(1,9)
Matched name -> ['pi']
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match name at loc 11(1,12)
Exception raised:Expected name (at char 11), (line:1, col:12)
Match number at loc 11(1,12)
Matched number -> ['2']