在Ruby中使用欧芹的缩进敏感解析器?
我试图使用Ruby中的库解析一个简单的缩进敏感语法 以下是我试图解析的语法示例:在Ruby中使用欧芹的缩进敏感解析器?,ruby,parsing,indentation,parslet,Ruby,Parsing,Indentation,Parslet,我试图使用Ruby中的库解析一个简单的缩进敏感语法 以下是我试图解析的语法示例: level0child0 level0child1 level1child0 level1child1 level2child0 level1child2 生成的树如下所示: [ { :identifier => "level0child0", :children => [] }, { :identifier => "level0child
level0child0
level0child1
level1child0
level1child1
level2child0
level1child2
生成的树如下所示:
[
{
:identifier => "level0child0",
:children => []
},
{
:identifier => "level0child1",
:children => [
{
:identifier => "level1child0",
:children => []
},
{
:identifier => "level1child1",
:children => [
{
:identifier => "level2child0",
:children => []
}
]
},
{
:identifier => "level1child2",
:children => []
},
]
}
]
我现在拥有的解析器可以解析嵌套级别0和1节点,但无法解析超过该级别的节点:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
rule(:indent) { str(' ') }
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }
rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }
rule(:document) { node.repeat }
root :document
end
require 'ap'
require 'pp'
begin
input = DATA.read
puts '', '----- input ----------------------------------------------------------------------', ''
ap input
tree = IndentationSensitiveParser.new.parse(input)
puts '', '----- tree -----------------------------------------------------------------------', ''
ap tree
rescue IndentationSensitiveParser::ParseFailed => failure
puts '', '----- error ----------------------------------------------------------------------', ''
puts failure.cause.ascii_tree
end
__END__
user
name
age
recipe
name
foo
bar
需要“欧芹”
类IndentationSensitiveParser>换行符>>(缩进>>标识符>>换行符.maybe)。重复.as(:子项)}
规则(:文档){node.repeat}
根:文档
结束
需要“ap”
需要“pp”
开始
输入=DATA.read
放置“”,'----输入-----------------------------------------------------------------------------“”
ap输入
tree=IndentationSensitiveParser.new.parse(输入)
放置“”,“----树--------------------------------------------------------------------------------------------”,“”
ap树
rescue IndentationSensitiveParser::ParseFailed=>失败
放置“”,'----错误----------------------------------------------------------------------“”
放置failure.cause.ascii_树
结束
__结束__
用户
名称
年龄
配方
名称
福
酒吧
很明显,我需要一个动态计数器,它期望3个缩进节点匹配嵌套级别3上的标识符
如何以这种方式使用Parslet实现缩进敏感语法解析器?有可能吗?有几种方法
节点(深度+1)
定义节点(深度)
。然而,这种方法的问题是节点
方法与字符串不匹配,它生成了一个解析器。因此递归调用永远不会完成
这就是动态存在的原因。它返回一个直到它尝试匹配它时才解析的解析器,允许您现在毫无问题地递归
请参阅以下代码:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
需要“欧芹”
类IndentationSensitiveParser>
标识符>>
newline.maybe>>
(动态{s,c}节点(深度+1).重复(0)}).as(:子节点)
结束
规则(:文档){node(0).重复}
根:文档
结束
这是我最喜欢的解决方案。我不喜欢将缩进过程的知识贯穿整个语法的想法。我宁愿只生成缩进和DEDENT标记,其他规则也可以使用这些标记来匹配“{”和“}”字符。下面是我的解决方案。它是一个类IndentParser
,任何解析器都可以扩展该类以获得生成的nl
、indent
和delegate
标记
require 'parslet'
# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
def try(source, context, consume_all)
succ("")
end
end
class NeverMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg = "ignore")
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class ErrorMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg)
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class IndentParser < Parslet::Parser
##
# Indentation handling: when matching a newline we check the following indentation. If
# that indicates an indent token or detent tokens (1+) then we stick these in a class
# variable and the high-priority indent/dedent rules will match as long as these
# remain. The nl rule consumes the indentation itself.
rule(:indent) { dynamic {|s,c|
if @indent.nil?
NeverMatch.new("Not an indent")
else
@indent = nil
AlwaysMatch.new
end
}}
rule(:dedent) { dynamic {|s,c|
if @dedents.nil? or @dedents.length == 0
NeverMatch.new("Not a dedent")
else
@dedents.pop
AlwaysMatch.new
end
}}
def checkIndentation(source, ctx)
# See if next line starts with indentation. If so, consume it and then process
# whether it is an indent or some number of dedents.
indent = ""
while source.matches?(Regexp.new("[ \t]"))
indent += source.consume(1).to_s #returns a Slice
end
if @indentStack.nil?
@indentStack = [""]
end
currentInd = @indentStack[-1]
return AlwaysMatch.new if currentInd == indent #no change, just match nl
if indent.start_with?(currentInd)
# Getting deeper
@indentStack << indent
@indent = indent #tells the indent rule to match one
return AlwaysMatch.new
else
# Either some number of de-dents or an error
# Find first match starting from back
count = 0
@indentStack.reverse.each do |level|
break if indent == level #found it,
if level.start_with?(indent)
# New indent is prefix, so we de-dented this level.
count += 1
next
end
# Not a match, not a valid prefix. So an error!
return ErrorMatch.new("Mismatched indentation level")
end
@dedents = [] if @dedents.nil?
count.times { @dedents << @indentStack.pop }
return AlwaysMatch.new
end
end
rule(:nl) { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
rule(:unixnl) { str("\n") }
rule(:macnl) { str("\r") }
rule(:winnl) { str("\r\n") }
rule(:anynl) { unixnl | macnl | winnl }
end
需要“欧芹”
#从动力学返回的原子并不意味着匹配任何东西。
类AlwaysMatch
我相信有很多地方可以改进,但这就是我到目前为止的想法
用法示例:
class MyParser < IndentParser
rule(:colon) { str(':') >> space? }
rule(:space) { match(' \t').repeat(1) }
rule(:space?) { space.maybe }
rule(:number) { match['0-9'].repeat(1).as(:num) >> space? }
rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
rule(:block) { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
rule(:stmt) { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
rule(:testblock) { identifier.as(:name) >> block }
rule(:prgm) { testblock >> nl.repeat }
root :prgm
end
classmyparser>空格?}
规则(:空格){match('\t')。重复(1)}
规则(:space?{space.maybe}
规则(:number){match['0-9'].重复(1).as(:num)>>空格?}
规则(:标识符){match['a-zA-Z']>>match['a-zA-Z0-9']。重复(0)}
规则(:block){colon>>nl>>indent>>stmt