在Ruby中使用欧芹的缩进敏感解析器?

在Ruby中使用欧芹的缩进敏感解析器?,ruby,parsing,indentation,parslet,Ruby,Parsing,Indentation,Parslet,我试图使用Ruby中的库解析一个简单的缩进敏感语法 以下是我试图解析的语法示例: level0child0 level0child1 level1child0 level1child1 level2child0 level1child2 生成的树如下所示: [ { :identifier => "level0child0", :children => [] }, { :identifier => "level0child

我试图使用Ruby中的库解析一个简单的缩进敏感语法

以下是我试图解析的语法示例:

level0child0
level0child1
  level1child0
  level1child1
    level2child0
  level1child2
生成的树如下所示:

[
  {
    :identifier => "level0child0",
    :children => []
  },
  {
    :identifier => "level0child1",
    :children => [
      {
        :identifier => "level1child0",
        :children => []
      },
      {
        :identifier => "level1child1",
        :children => [
          {
            :identifier => "level2child0",
            :children => []
          }
        ]
      },
      {
        :identifier => "level1child2",
        :children => []
      },
    ]
  }
]
我现在拥有的解析器可以解析嵌套级别0和1节点,但无法解析超过该级别的节点:

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  rule(:indent) { str('  ') }
  rule(:newline) { str("\n") }
  rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }

  rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }

  rule(:document) { node.repeat }

  root :document

end

require 'ap'
require 'pp'

begin
  input = DATA.read

  puts '', '----- input ----------------------------------------------------------------------', ''
  ap input

  tree = IndentationSensitiveParser.new.parse(input)

  puts '', '----- tree -----------------------------------------------------------------------', ''
  ap tree

rescue IndentationSensitiveParser::ParseFailed => failure
  puts '', '----- error ----------------------------------------------------------------------', ''
  puts failure.cause.ascii_tree
end

__END__
user
  name
  age
recipe
  name
foo
bar
需要“欧芹”
类IndentationSensitiveParser>换行符>>(缩进>>标识符>>换行符.maybe)。重复.as(:子项)}
规则(:文档){node.repeat}
根:文档
结束
需要“ap”
需要“pp”
开始
输入=DATA.read
放置“”,'----输入-----------------------------------------------------------------------------“”
ap输入
tree=IndentationSensitiveParser.new.parse(输入)
放置“”,“----树--------------------------------------------------------------------------------------------”,“”
ap树
rescue IndentationSensitiveParser::ParseFailed=>失败
放置“”,'----错误----------------------------------------------------------------------“”
放置failure.cause.ascii_树
结束
__结束__
用户
名称
年龄
配方
名称
福
酒吧
很明显,我需要一个动态计数器,它期望3个缩进节点匹配嵌套级别3上的标识符


如何以这种方式使用Parslet实现缩进敏感语法解析器?有可能吗?

有几种方法

  • 通过将每一行识别为缩进和标识符的集合来解析文档,然后根据缩进的数量应用转换来重构层次结构

  • 使用捕获来存储当前缩进,并期望下一个节点包含该缩进加上更多要作为子节点匹配的缩进(我没有深入研究这种方法,因为我遇到了下一种方法)

  • 规则只是方法。因此,您可以将“node”定义为一个方法,这意味着您可以传递参数!(详情如下)

  • 这允许您根据
    节点(深度+1)
    定义
    节点(深度)
    。然而,这种方法的问题是
    节点
    方法与字符串不匹配,它生成了一个解析器。因此递归调用永远不会完成

    这就是
    动态
    存在的原因。它返回一个直到它尝试匹配它时才解析的解析器,允许您现在毫无问题地递归

    请参阅以下代码:

    require 'parslet'
    
    class IndentationSensitiveParser < Parslet::Parser
    
      def indent(depth)
        str('  '*depth)
      end
    
      rule(:newline) { str("\n") }
    
      rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
    
      def node(depth) 
        indent(depth) >> 
        identifier >> 
        newline.maybe >> 
        (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
      end 
    
      rule(:document) { node(0).repeat }
    
      root :document
    end
    
    需要“欧芹”
    类IndentationSensitiveParser>
    标识符>>
    newline.maybe>>
    (动态{s,c}节点(深度+1).重复(0)}).as(:子节点)
    结束
    规则(:文档){node(0).重复}
    根:文档
    结束
    

    这是我最喜欢的解决方案。

    我不喜欢将缩进过程的知识贯穿整个语法的想法。我宁愿只生成缩进和DEDENT标记,其他规则也可以使用这些标记来匹配“{”和“}”字符。下面是我的解决方案。它是一个类
    IndentParser
    ,任何解析器都可以扩展该类以获得生成的
    nl
    indent
    delegate
    标记

    require 'parslet'
    
    # Atoms returned from a dynamic that aren't meant to match anything.
    class AlwaysMatch < Parslet::Atoms::Base
      def try(source, context, consume_all)
        succ("")
      end
    end
    class NeverMatch < Parslet::Atoms::Base
      attr_accessor :msg
      def initialize(msg = "ignore")
        self.msg = msg
      end
      def try(source, context, consume_all)
        context.err(self, source, msg)
      end
    end
    class ErrorMatch < Parslet::Atoms::Base
      attr_accessor :msg
      def initialize(msg)
        self.msg = msg
      end
      def try(source, context, consume_all)
        context.err(self, source, msg)
      end
    end
    
    class IndentParser < Parslet::Parser
    
      ##
      # Indentation handling: when matching a newline we check the following indentation. If
      # that indicates an indent token or detent tokens (1+) then we stick these in a class
      # variable and the high-priority indent/dedent rules will match as long as these 
      # remain. The nl rule consumes the indentation itself.
    
      rule(:indent)  { dynamic {|s,c| 
        if @indent.nil?
          NeverMatch.new("Not an indent")
        else
          @indent = nil
          AlwaysMatch.new
        end
      }}
      rule(:dedent)  { dynamic {|s,c|
        if @dedents.nil? or @dedents.length == 0
          NeverMatch.new("Not a dedent")
        else
          @dedents.pop
          AlwaysMatch.new
        end
      }}
    
      def checkIndentation(source, ctx)
        # See if next line starts with indentation. If so, consume it and then process
        # whether it is an indent or some number of dedents.
        indent = ""
        while source.matches?(Regexp.new("[ \t]"))
          indent += source.consume(1).to_s #returns a Slice
        end
    
        if @indentStack.nil?
          @indentStack = [""]
        end
    
        currentInd = @indentStack[-1]
        return AlwaysMatch.new if currentInd == indent #no change, just match nl
    
        if indent.start_with?(currentInd)
          # Getting deeper
          @indentStack << indent
          @indent = indent #tells the indent rule to match one
          return AlwaysMatch.new
        else
          # Either some number of de-dents or an error
    
          # Find first match starting from back
          count = 0
          @indentStack.reverse.each do |level|
            break if indent == level #found it, 
    
            if level.start_with?(indent)
              # New indent is prefix, so we de-dented this level.
              count += 1
              next
            end
    
            # Not a match, not a valid prefix. So an error!
            return ErrorMatch.new("Mismatched indentation level")
          end
    
          @dedents = [] if @dedents.nil?
          count.times { @dedents << @indentStack.pop }
          return AlwaysMatch.new
        end
      end
      rule(:nl)         { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
    
      rule(:unixnl)     { str("\n") }
      rule(:macnl)      { str("\r") }
      rule(:winnl)      { str("\r\n") }
      rule(:anynl)      { unixnl | macnl | winnl }
    
    end
    
    需要“欧芹”
    #从动力学返回的原子并不意味着匹配任何东西。
    类AlwaysMatch
    我相信有很多地方可以改进,但这就是我到目前为止的想法

    用法示例:

    class MyParser < IndentParser
      rule(:colon)      { str(':') >> space? }
    
      rule(:space)      { match(' \t').repeat(1) }
      rule(:space?)     { space.maybe }
    
      rule(:number)     { match['0-9'].repeat(1).as(:num) >> space? }
      rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
    
      rule(:block)      { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
      rule(:stmt)       { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
      rule(:testblock)  { identifier.as(:name) >> block }
    
      rule(:prgm)       { testblock >> nl.repeat }
      root :prgm
    end
    
    classmyparser>空格?}
    规则(:空格){match('\t')。重复(1)}
    规则(:space?{space.maybe}
    规则(:number){match['0-9'].重复(1).as(:num)>>空格?}
    规则(:标识符){match['a-zA-Z']>>match['a-zA-Z0-9']。重复(0)}
    规则(:block){colon>>nl>>indent>>stmt