Parsing 使用scala解析器组合器解析基于缩进的语言_Parsing_Scala_Indentation_Parser Combinators

Parsing 使用scala解析器组合器解析基于缩进的语言

parsing scala

Parsing 使用scala解析器组合器解析基于缩进的语言,parsing,scala,indentation,parser-combinators,Parsing,Scala,Indentation,Parser Combinators,有没有一种方便的方法可以使用Scala的解析器组合器来解析缩进非常重要的语言？（例如Python）假设我们有一种非常简单的语言，它是一个有效的程序 block inside the block 我们想把它解析成一个列表[String]，块内的每一行都是一个字符串我们首先定义一个采用最小缩进级别的方法，并为具有该缩进级别的行返回解析器 def line(minIndent:Int):Parser[String] = repN(minIndent + 1,"\\s".r) ~

有没有一种方便的方法可以使用Scala的解析器组合器来解析缩进非常重要的语言？（例如Python）

假设我们有一种非常简单的语言，它是一个有效的程序

block
  inside
  the
  block

我们想把它解析成一个

列表[String]

，块内的每一行都是一个

字符串
我们首先定义一个采用最小缩进级别的方法，并为具有该缩进级别的行返回解析器
def line(minIndent:Int):Parser[String] = 
  repN(minIndent + 1,"\\s".r) ~ ".*".r ^^ {case s ~ r => s.mkString + r}

然后，我们通过在行之间使用合适的分隔符重复行解析器来定义具有最小缩进级别的块
def lines(minIndent:Int):Parser[List[String]] =
  rep1sep(line(minIndent), "[\n\r]|(\n\r)".r)

现在我们可以为我们的小语言定义一个解析器，如下所示：
val block:Parser[List[String]] =
  (("\\s*".r <~ "block\\n".r) ^^ { _.size }) >> lines

我们得到了
[4.10] parsed: List(    inside,     the,     block)

要编译所有这些，您需要这些导入
import scala.util.parsing.combinator.RegexParsers
import scala.util.parsing.input.CharSequenceReader

你需要把所有的东西都放到一个对象中，像这样扩展RegexParsers

object MyParsers extends RegexParsers {
  override def skipWhitespace = false
  ....

据我所知，Scala解析器组合器不支持这种开箱即用的东西。当然，您可以通过有意义的方式解析空白来实现这一点，但是您会遇到一些问题，因为您需要某种形式的状态机来跟踪缩进堆栈
我建议做一个预处理步骤。下面是一个小型预处理器，它将标记添加到单独的缩进块中：
object Preprocessor {

    val BlockStartToken = "{"
    val BlockEndToken = "}"

    val TabSize = 4 //how many spaces does a tab take

    def preProcess(text: String): String = {
        val lines = text.split('\n').toList.filterNot(_.forall(isWhiteChar))
        val processedLines = BlockStartToken :: insertTokens(lines, List(0))
        processedLines.mkString("\n")
    }

    def insertTokens(lines: List[String], stack: List[Int]): List[String] = lines match {
        case List() => List.fill(stack.length) { BlockEndToken } //closing all opened blocks
        case line :: rest => {
            (computeIndentation(line), stack) match {
                case (indentation, top :: stackRest) if indentation > top => {
                    BlockStartToken :: line :: insertTokens(rest,  indentation :: stack)
                }
                case (indentation, top :: stackRest) if indentation == top =>
                    line :: insertTokens(rest, stack)
                case (indentation, top :: stackRest) if indentation < top => {
                    BlockEndToken :: insertTokens(lines, stackRest)
                }
                case _ => throw new IllegalStateException("Invalid algorithm")
            }
        }
    }


    private def computeIndentation(line: String): Int = {
        val whiteSpace = line takeWhile isWhiteChar
        (whiteSpace map {
            case ' ' => 1
            case '\t' => TabSize
        }).sum
    }

    private def isWhiteChar(ch: Char) = ch == ' ' || ch == '\t'
}

。。。以下结果
{
line1
line2
{
    line3
    line4
    line5
{
        line6
        line7
}
}
{
  line8
  line9
}
line10
{
   line11
   line12
   line13
}
}

之后，您可以使用combinator库以更简单的方式进行解析
希望这有助于
使用覆盖val skipWhitespace=false
val text =
    """
      |line1
      |line2
      |    line3
      |    line4
      |    line5
      |        line6
      |        line7
      |  line8
      |  line9
      |line10
      |   line11
      |   line12
      |   line13
    """.stripMargin
println(Preprocessor.preProcess(text))

{
line1
line2
{
    line3
    line4
    line5
{
        line6
        line7
}
}
{
  line8
  line9
}
line10
{
   line11
   line12
   line13
}
}