Ruby中的解析器:处理粘性注释和引号

Ruby中的解析器:处理粘性注释和引号,ruby,parsing,recursive-descent,Ruby,Parsing,Recursive Descent,我试图用Ruby为语法创建一个递归下降解析器,该语法由以下规则定义 输入由空格分隔的卡片组成,以停止字开头, 其中,空格为regex/[\n\t]+/ 卡片可以由关键字或/和值组成,也可以用空格分隔, 哪些具有特定于卡的订单/图案 所有停止字和关键字不区分大小写,即:/^[a-z]+[a-z0-9]*$/i 值可以是双引号字符串,不能与 空白处的其他单词,例如: word"quoted string"word 值也可以是字/^[a-z]+[a-z0-9]*$/,或整数,或浮点数,例如-1.15

我试图用Ruby为语法创建一个递归下降解析器,该语法由以下规则定义

输入由空格分隔的卡片组成,以停止字开头, 其中,空格为regex/[\n\t]+/ 卡片可以由关键字或/和值组成,也可以用空格分隔, 哪些具有特定于卡的订单/图案 所有停止字和关键字不区分大小写,即:/^[a-z]+[a-z0-9]*$/i 值可以是双引号字符串,不能与 空白处的其他单词,例如:

word"quoted string"word
值也可以是字/^[a-z]+[a-z0-9]*$/,或整数,或浮点数,例如-1.15或1.0e+2

单行注释由表示,不能与分隔 换句话说,例如:

word#single-line comment\n
word/*multi-line 
comment*/word
多行注释用/*和*/表示,可以不使用 与其他词分开,例如:

word#single-line comment\n
word/*multi-line 
comment*/word
由于大多数单词都是用空格分隔的,所以有一段时间我在考虑将整个输入拆分并逐字解析。为了处理评论和引用,我打算

words = input_text.gsub( /([\"\#\n]|\/\*|\*\/)/, ' \1 ' ).split( /[ \t]+/ )

但是,如果我想保留字符串和注释,则可以通过这种方式修改它们的内容。你会如何处理这些黏糊糊的评论和引语?

好的,我自己做的。如果不需要可读性,可以最小化以下代码

class WordParser
  attr_reader :words

  def initialize text
    @text = text
  end

  def parse
    reset_parser
    until eof?
      case curr_char
        when '"' then
          start_word and add_chars_until? '"'
          close_word
        when '#','%' then
          start_word and add_chars_until? "\n"
          close_word
        when '/' then
          if next_is? '*' then
            start_word and 2.times { add_char }
            add_char until curr_is? '*' and next_is? '/' or eof?
            2.times { add_char } unless eof?
            close_word
          else
            # parser_error "unexpected symbol '/'" # if not allowed in the grammar
            start_word unless word_already_started?
            add_char
          end
        when /[^\s]/ then
          start_word unless word_already_started?
          add_char
      else # skip whitespaces etc. between words
        move and close_word
      end
    end
    return @words
  end

private

  def reset_parser
    @position = 0
    @line, @column = 1, 1
    @words = []
    @word_started = false
  end

  def parser_error s
    Kernel.puts 'Parser error on line %d, col %d: ' + s
    raise 'Parser error'
  end

  def word_already_started?
    @word_started
  end

  def close_word
    @word_started = false
  end

  def add_chars_until? ch
    add_char until next_is? ch or eof?
    2.times { add_char } unless eof?
  end

  def add_char
    @words.last[:to] = @position
    # @words.last[:length] += 1
    # @word.last += curr_char # if one just collects words
    move
  end

  def start_word
    @words.push from: @position, to: @position, line: @line, column: @column
    # @words.push '' unless @words.last.empty? # if one just collects words
    @word_started = true
  end

  def move
    increase :@position
    return if eof?
    if prev_is? "\n"
      increase :@line
      reset :@column
    else
      increase :@column
    end
  end

  def reset var; instance_variable_set(var, 1) end
  def increase var; instance_variable_set(var, instance_variable_get(var)+1) end

  def eof?; @position >= @text.length end

  def prev_is? ch; prev_char == ch end
  def curr_is? ch; curr_char == ch end
  def next_is? ch; next_char == ch end

  def prev_char; @text[ @position-1 ] end
  def curr_char; @text[ @position   ] end
  def next_char; @text[ @position+1 ] end
end
使用问题中的示例进行测试

words = WordParser.new(text).parse
p words.collect { |w| text[ w[:from]..w[:to] ] } .to_a

# >> ["# Input example. Stop-words are chosen just to highlight them: set, object\n", 
# >>  "set", "title", "\"Input example\"", "set", "objects", "2", 
# >>  "#not-separated by white-space. test: \"/*\n", "set", "test", "\"#/*\"", 
# >>  "object", "1", "shape", "box", "/* shape is a Keyword, \nbox is a Value. test: \"#*/", 
# >>  "object", "2", "shape", "sphere", "set", "data", "# message and complete are Values\n", 
# >>  "0", "0", "0", "0", "1", "18", "18", "18", "1", "35", "35", "35", "72", 
# >>  "35", "35", "# all numbers are Values of the Card \"set\"\n"]

因此,现在我可以用它来进一步解析单词。

与之相关的是,我不认为在空白处拆分文本是解析任何东西的好主意,除了最简单的语法。我不想在这里写一篇关于创建解析器的文章。。。无论如何,谷歌编译ruby,解析器生成ruby。。。这是一个例子,树顶对我来说有点难理解。也许你能告诉我如何把它应用到我的语法中去?我想,对于如此简单的语法,我可以在SO用户的帮助下自己制作一些东西。