在Ruby中使用欧芹的缩进敏感解析器？_Ruby_Parsing_Indentation_Parslet

在Ruby中使用欧芹的缩进敏感解析器？

ruby parsing

在Ruby中使用欧芹的缩进敏感解析器？,ruby,parsing,indentation,parslet,Ruby,Parsing,Indentation,Parslet,我试图使用Ruby中的库解析一个简单的缩进敏感语法以下是我试图解析的语法示例： level0child0 level0child1 level1child0 level1child1 level2child0 level1child2 生成的树如下所示： [ { :identifier => "level0child0", :children => [] }, { :identifier => "level0child

我试图使用Ruby中的库解析一个简单的缩进敏感语法

以下是我试图解析的语法示例：

level0child0
level0child1
  level1child0
  level1child1
    level2child0
  level1child2

生成的树如下所示：

[
  {
    :identifier => "level0child0",
    :children => []
  },
  {
    :identifier => "level0child1",
    :children => [
      {
        :identifier => "level1child0",
        :children => []
      },
      {
        :identifier => "level1child1",
        :children => [
          {
            :identifier => "level2child0",
            :children => []
          }
        ]
      },
      {
        :identifier => "level1child2",
        :children => []
      },
    ]
  }
]

我现在拥有的解析器可以解析嵌套级别0和1节点，但无法解析超过该级别的节点：

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  rule(:indent) { str('  ') }
  rule(:newline) { str("\n") }
  rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }

  rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }

  rule(:document) { node.repeat }

  root :document

end

require 'ap'
require 'pp'

begin
  input = DATA.read

  puts '', '----- input ----------------------------------------------------------------------', ''
  ap input

  tree = IndentationSensitiveParser.new.parse(input)

  puts '', '----- tree -----------------------------------------------------------------------', ''
  ap tree

rescue IndentationSensitiveParser::ParseFailed => failure
  puts '', '----- error ----------------------------------------------------------------------', ''
  puts failure.cause.ascii_tree
end

__END__
user
  name
  age
recipe
  name
foo
bar

需要“欧芹”
类IndentationSensitiveParser>换行符>>（缩进>>标识符>>换行符.maybe）。重复.as（：子项）}
规则（：文档）{node.repeat}
根：文档
结束
需要“ap”
需要“pp”
开始
输入=DATA.read
放置“”，'----输入-----------------------------------------------------------------------------“”
ap输入
tree=IndentationSensitiveParser.new.parse（输入）
放置“”，“----树--------------------------------------------------------------------------------------------”，“”
ap树
rescue IndentationSensitiveParser:：ParseFailed=>失败
放置“”，'----错误----------------------------------------------------------------------“”
放置failure.cause.ascii_树
结束
__结束__
用户
名称
年龄
配方
名称
福
酒吧

很明显，我需要一个动态计数器，它期望3个缩进节点匹配嵌套级别3上的标识符

如何以这种方式使用Parslet实现缩进敏感语法解析器？有可能吗？

有几种方法

通过将每一行识别为缩进和标识符的集合来解析文档，然后根据缩进的数量应用转换来重构层次结构

使用捕获来存储当前缩进，并期望下一个节点包含该缩进加上更多要作为子节点匹配的缩进（我没有深入研究这种方法，因为我遇到了下一种方法）

规则只是方法。因此，您可以将“node”定义为一个方法，这意味着您可以传递参数！（详情如下）

这允许您根据

节点（深度+1）

定义

节点（深度）

。然而，这种方法的问题是

节点

方法与字符串不匹配，它生成了一个解析器。因此递归调用永远不会完成

这就是

动态存在的原因。它返回一个直到它尝试匹配它时才解析的解析器，允许您现在毫无问题地递归
请参阅以下代码：
require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  def indent(depth)
    str('  '*depth)
  end

  rule(:newline) { str("\n") }

  rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }

  def node(depth) 
    indent(depth) >> 
    identifier >> 
    newline.maybe >> 
    (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
  end 

  rule(:document) { node(0).repeat }

  root :document
end

需要“欧芹”
类IndentationSensitiveParser>
标识符>>
newline.maybe>>
（动态{s，c}节点（深度+1）.重复（0）}）.as（：子节点）
结束
规则（：文档）{node（0）.重复}
根：文档
结束

这是我最喜欢的解决方案。
我不喜欢将缩进过程的知识贯穿整个语法的想法。我宁愿只生成缩进和DEDENT标记，其他规则也可以使用这些标记来匹配“{”和“}”字符。下面是我的解决方案。它是一个类IndentParser
，任何解析器都可以扩展该类以获得生成的nl
、indent
和delegate
标记
require 'parslet'

# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
  def try(source, context, consume_all)
    succ("")
  end
end
class NeverMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg = "ignore")
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end
class ErrorMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg)
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end

class IndentParser < Parslet::Parser

  ##
  # Indentation handling: when matching a newline we check the following indentation. If
  # that indicates an indent token or detent tokens (1+) then we stick these in a class
  # variable and the high-priority indent/dedent rules will match as long as these 
  # remain. The nl rule consumes the indentation itself.

  rule(:indent)  { dynamic {|s,c| 
    if @indent.nil?
      NeverMatch.new("Not an indent")
    else
      @indent = nil
      AlwaysMatch.new
    end
  }}
  rule(:dedent)  { dynamic {|s,c|
    if @dedents.nil? or @dedents.length == 0
      NeverMatch.new("Not a dedent")
    else
      @dedents.pop
      AlwaysMatch.new
    end
  }}

  def checkIndentation(source, ctx)
    # See if next line starts with indentation. If so, consume it and then process
    # whether it is an indent or some number of dedents.
    indent = ""
    while source.matches?(Regexp.new("[ \t]"))
      indent += source.consume(1).to_s #returns a Slice
    end

    if @indentStack.nil?
      @indentStack = [""]
    end

    currentInd = @indentStack[-1]
    return AlwaysMatch.new if currentInd == indent #no change, just match nl

    if indent.start_with?(currentInd)
      # Getting deeper
      @indentStack << indent
      @indent = indent #tells the indent rule to match one
      return AlwaysMatch.new
    else
      # Either some number of de-dents or an error

      # Find first match starting from back
      count = 0
      @indentStack.reverse.each do |level|
        break if indent == level #found it, 

        if level.start_with?(indent)
          # New indent is prefix, so we de-dented this level.
          count += 1
          next
        end

        # Not a match, not a valid prefix. So an error!
        return ErrorMatch.new("Mismatched indentation level")
      end

      @dedents = [] if @dedents.nil?
      count.times { @dedents << @indentStack.pop }
      return AlwaysMatch.new
    end
  end
  rule(:nl)         { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}

  rule(:unixnl)     { str("\n") }
  rule(:macnl)      { str("\r") }
  rule(:winnl)      { str("\r\n") }
  rule(:anynl)      { unixnl | macnl | winnl }

end

需要“欧芹”
#从动力学返回的原子并不意味着匹配任何东西。
类AlwaysMatch

我相信有很多地方可以改进，但这就是我到目前为止的想法
用法示例：
class MyParser < IndentParser
  rule(:colon)      { str(':') >> space? }

  rule(:space)      { match(' \t').repeat(1) }
  rule(:space?)     { space.maybe }

  rule(:number)     { match['0-9'].repeat(1).as(:num) >> space? }
  rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }

  rule(:block)      { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
  rule(:stmt)       { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
  rule(:testblock)  { identifier.as(:name) >> block }

  rule(:prgm)       { testblock >> nl.repeat }
  root :prgm
end

classmyparser>空格？}
规则（：空格）{match（'\t'）。重复（1）}
规则（：space？{space.maybe}
规则（：number）{match['0-9'].重复（1）.as（：num）>>空格？}
规则（：标识符）{match['a-zA-Z']>>match['a-zA-Z0-9']。重复（0）}
规则（：block）{colon>>nl>>indent>>stmt