Parsing 如何编写PLY语法来解析路径?

Parsing 如何编写PLY语法来解析路径?,parsing,grammar,yacc,ply,shift-reduce-conflict,Parsing,Grammar,Yacc,Ply,Shift Reduce Conflict,我试图用PLY编写一个语法来解析文件中的路径。我遇到了shift REDUCT冲突,我不知道如何更改语法来修复它。 下面是我试图解析的文件的一个示例。路径/文件名可以是任何可接受的linux路径 file : ../../dir/filename.txt file : filename.txt file : filename 这是我写的语法 header : ID COLON path path : pathexpr filename pathexpr : PERIOD PERIOD DI

我试图用PLY编写一个语法来解析文件中的路径。我遇到了shift REDUCT冲突,我不知道如何更改语法来修复它。 下面是我试图解析的文件的一个示例。路径/文件名可以是任何可接受的linux路径

file : ../../dir/filename.txt
file : filename.txt
file : filename
这是我写的语法

header : ID COLON path

path : pathexpr filename

pathexpr : PERIOD PERIOD DIVIDE pathexpr
           | PERIOD DIVIDE pathexpr
           | ID DIVIDE pathexpr 
           |
filename : ID PERIOD ID
           | ID               
这是我的代币。我使用的是ctokens图书馆。只是为了省力写我自己的

t_ID = r'[A-Za-z_][A-Za-z0-9_]*'
t_PERIOD = r'\.'
t_DIVIDE = r'/'
t_COLON = r':'
因此,我认为在“filename”规则中存在一个shift-reduce冲突,因为解析器不知道是将令牌减少为“ID”还是将其转换为“ID-PERIOD-ID”。我认为在没有路径(“文件名”)的情况下还有另一个问题,它将使用pathexpr中的令牌,而不是减少为空


我如何修改语法来处理这些情况?也许我需要更改我的令牌?

我想您可能正在使用PLY,而不是pyparsing,查看那些“t_xxx”名称。但这里有一个解决您问题的pyparsing解决方案,请参见下面的有用注释:

"""
header : ID COLON path

path : pathexpr filename

pathexpr : PERIOD PERIOD DIVIDE pathexpr
           | PERIOD DIVIDE pathexpr
           | ID DIVIDE pathexpr 
           |
filename : ID PERIOD ID
           | ID 
"""

from pyparsing import *

ID = Word(alphanums)
PERIOD = Literal('.')
DIVIDE = Literal('/')
COLON = Literal(':') 

# move this to the top, so we can reference it in a negative
# lookahead while parsing the path
file_name = ID + Optional(PERIOD + ID)

# simple path_element - not sufficient, as it will consume 
# trailing ID that should really be part of the filename
path_element = PERIOD+PERIOD | PERIOD | ID

# more complex path_element - adds lookahead to avoid consuming
# filename as a part of the path
path_element = (~(file_name + WordEnd())) + (PERIOD+PERIOD | PERIOD | ID)

# use repetition for these kind of expressions, not recursion
path_expr = path_element + ZeroOrMore(DIVIDE + path_element)

# use Combine so that all the tokens will get returned as a
# contiguous string, not as separate path_elements and slashes
path = Combine(Optional(path_expr + DIVIDE) + file_name)

# define header - note the use of results names, which will allow
# you to access the separate fields by name instead of by position
# (similar to using named groups in regexp's)
header = ID("id") + COLON + path("path")

tests = """\
file: ../../dir/filename.txt
file: filename.txt
file: filename""".splitlines()

for t in tests:
    print t
    print header.parseString(t).dump()
    print
印刷品

file: ../../dir/filename.txt
['file', ':', '../../dir/filename.txt']
- id: file
- path: ../../dir/filename.txt

file: filename.txt
['file', ':', 'filename.txt']
- id: file
- path: filename.txt

file: filename
['file', ':', 'filename']
- id: file
- path: filename

我认为您可能正在使用PLY,而不是pyparsing,查看那些“t_xxx”名称。但这里有一个解决您问题的pyparsing解决方案,请参见下面的有用注释:

"""
header : ID COLON path

path : pathexpr filename

pathexpr : PERIOD PERIOD DIVIDE pathexpr
           | PERIOD DIVIDE pathexpr
           | ID DIVIDE pathexpr 
           |
filename : ID PERIOD ID
           | ID 
"""

from pyparsing import *

ID = Word(alphanums)
PERIOD = Literal('.')
DIVIDE = Literal('/')
COLON = Literal(':') 

# move this to the top, so we can reference it in a negative
# lookahead while parsing the path
file_name = ID + Optional(PERIOD + ID)

# simple path_element - not sufficient, as it will consume 
# trailing ID that should really be part of the filename
path_element = PERIOD+PERIOD | PERIOD | ID

# more complex path_element - adds lookahead to avoid consuming
# filename as a part of the path
path_element = (~(file_name + WordEnd())) + (PERIOD+PERIOD | PERIOD | ID)

# use repetition for these kind of expressions, not recursion
path_expr = path_element + ZeroOrMore(DIVIDE + path_element)

# use Combine so that all the tokens will get returned as a
# contiguous string, not as separate path_elements and slashes
path = Combine(Optional(path_expr + DIVIDE) + file_name)

# define header - note the use of results names, which will allow
# you to access the separate fields by name instead of by position
# (similar to using named groups in regexp's)
header = ID("id") + COLON + path("path")

tests = """\
file: ../../dir/filename.txt
file: filename.txt
file: filename""".splitlines()

for t in tests:
    print t
    print header.parseString(t).dump()
    print
印刷品

file: ../../dir/filename.txt
['file', ':', '../../dir/filename.txt']
- id: file
- path: ../../dir/filename.txt

file: filename.txt
['file', ':', 'filename.txt']
- id: file
- path: filename.txt

file: filename
['file', ':', 'filename']
- id: file
- path: filename

简单的解决方案是:使用左递归而不是右递归

LR解析器(如PLY和yacc)更喜欢左递归,因为它避免了扩展解析器堆栈。它通常也更接近表达式的语义——这在您想要实际解释语言而不仅仅是识别语言时非常有用——而且它通常(在本例中)避免使用左因子

例如,在这种情况下,需要通过在当前找到的目录中查找段目录,将每个路径段应用于前面的
pathexpr
。解析器操作很明确:在$1中查找$2。如何为正确的递归版本更正操作

因此,一个简单的转换:

header   : ID COLON path

path     : pathexpr filename

pathexpr : pathexpr PERIOD PERIOD DIVIDE
         | pathexpr PERIOD DIVIDE
         | pathexpr ID DIVIDE
         |
filename : ID PERIOD ID
         | ID

简单的解决方案是:使用左递归而不是右递归

LR解析器(如PLY和yacc)更喜欢左递归,因为它避免了扩展解析器堆栈。它通常也更接近表达式的语义——这在您想要实际解释语言而不仅仅是识别语言时非常有用——而且它通常(在本例中)避免使用左因子

例如,在这种情况下,需要通过在当前找到的目录中查找段目录,将每个路径段应用于前面的
pathexpr
。解析器操作很明确:在$1中查找$2。如何为正确的递归版本更正操作

因此,一个简单的转换:

header   : ID COLON path

path     : pathexpr filename

pathexpr : pathexpr PERIOD PERIOD DIVIDE
         | pathexpr PERIOD DIVIDE
         | pathexpr ID DIVIDE
         |
filename : ID PERIOD ID
         | ID

我相信这种语法应该是有效的,它还有一个额外的优势,即能够重新组织路径的各个部分,如扩展、目录、驱动器等。 我还没有做语法分析器,只有这个语法

fullfilepath : path SLASH filename
path : root
    | root SLASH directories
root : DRIVE
    | PERCENT WIN_DEF_DIR PERCENT
directories : directory
            | directory SLASH directories
directory : VALIDNAME
filename : VALIDNAME
        | VALIDNAME DOT EXTENSION

我相信这种语法应该是有效的,它还有一个额外的优势,即能够重新组织路径的各个部分,如扩展、目录、驱动器等。 我还没有做语法分析器,只有这个语法

fullfilepath : path SLASH filename
path : root
    | root SLASH directories
root : DRIVE
    | PERCENT WIN_DEF_DIR PERCENT
directories : directory
            | directory SLASH directories
directory : VALIDNAME
filename : VALIDNAME
        | VALIDNAME DOT EXTENSION

谢谢你的回复!对不起,是的,我是说。我最初希望使用pyparsing,但后来改用PLY。我不小心把名字弄混了。谢谢你的回复!对不起,是的,我是说。我最初希望使用pyparsing,但后来改用PLY。我不小心把名字弄混了。谢谢你的帮助!从右递归更改为左修复了此问题。感谢您的帮助!从右递归改为左递归解决了这个问题。