惯用Scala将分隔字符串反序列化为大小写类的一种方法

惯用Scala将分隔字符串反序列化为大小写类的一种方法,scala,scalaz,shapeless,Scala,Scalaz,Shapeless,假设我正在处理一个简单的以冒号分隔的文本协议,它看起来像: Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b] RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0 M3UA_IP_LINK:m3node:AUT001LKSET1:AU

假设我正在处理一个简单的以冒号分隔的文本协议,它看起来像:

Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
OPC:m3node:1-10-2(P):A7:NAT0
....
我想将每一行反序列化为一个case类的实例,但要采用类型安全的方式。我的第一次尝试是使用类型类为我可能遇到的每种类型定义“read”方法,此外还使用case类上的“tuple”方法来获取可应用于参数元组的函数,如下所示:

case class Foo(a: String, b: Integer)

trait Reader[T] {
  def read(s: String): T
}

object Reader {
  implicit object StringParser extends Reader[String] { def read(s: String): String = s }
  implicit object IntParser extends Reader[Integer] { def read(s: String): Integer = s.toInt }
}

def create[A1, A2, Ret](fs: Seq[String], f: ((A1, A2)) => Ret)(implicit A1Reader: Reader[A1], A2Reader: Reader[A2]): Ret = {
  f((A1Reader.read(fs(0)), A2Reader.read(fs(1))))
}

create(Seq("foo", "42"), Foo.tupled) // gives me a Foo("foo", 42)
但问题是,我需要为每个元组和函数arity定义create方法,这意味着create最多有22个版本。此外,这不考虑验证或接收损坏的数据。

因为有一个不成形的标签,一个可能的解决方案使用它,但我不是专家,我想可以做得更好:

首先,关于缺少验证的问题,如果您不关心错误消息,只需使用read return Try或scalaz.validation或just选项即可

那么关于样板文件,您可以尝试使用HList。这样,你就不需要去做所有的算术运算了

import scala.util._
import shapeless._

trait Reader[+A] { self =>
  def read(s: String) : Try[A]
  def map[B](f: A => B): Reader[B] = new Reader[B] {
    def read(s: String) = self.read(s).map(f)
  }
}    

object Reader {
  // convenience
  def apply[A: Reader] : Reader[A] = implicitly[Reader[A]]
  def read[A: Reader](s: String): Try[A] = implicitly[Reader[A]].read(s)

  // base types
  implicit object StringReader extends Reader[String] {
    def read(s: String) = Success(s)
  }
  implicit object IntReader extends Reader[Int] {
    def read(s: String) = Try {s.toInt}
  }

  // HLists, parts separated by ":"
  implicit object HNilReader extends Reader[HNil] {
    def read(s: String) = 
      if (s.isEmpty()) Success(HNil) 
      else Failure(new Exception("Expect empty"))
  }
  implicit def HListReader[A : Reader, H <: HList : Reader] : Reader[A :: H] 
  = new Reader[A :: H] {
    def read(s: String) = {
      val (before, colonAndBeyond) = s.span(_ != ':')
      val after = if (colonAndBeyond.isEmpty()) "" else colonAndBeyond.tail
      for {
        a <- Reader.read[A](before)
        b <- Reader.read[H](after)
      } yield a :: b
    }
  }

}
它的工作原理是:

println(Reader.read[Foo]("12:text"))
Success(Foo(12,text))

如果没有scalaz和Shapess,我认为解析某些输入的表意Scala方法是Scala解析器组合器。在您的示例中,我将尝试以下方法:

import org.joda.time.DateTime
import scala.util.parsing.combinator.JavaTokenParsers

val input =
  """Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
    |RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
    |M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
    |OPC:m3node:1-10-2(P):A7:NAT0""".stripMargin

trait LineContent
case class Event(number : Int, typ : String, when : DateTime, stuff : List[String]) extends LineContent
case class Reset(node : String, stuff : List[String]) extends LineContent
case class Other(typ : String, stuff : List[String]) extends LineContent

object LineContentParser extends JavaTokenParsers {
  override val whiteSpace=""":""".r

  val space="""\s+""".r
  val lineEnd = """"\n""".r  //"""\s*(\r?\n\r?)+""".r
  val field = """[^:]*""".r

  def stuff : Parser[List[String]] = rep(field)
  def integer : Parser[Int] = log(wholeNumber ^^ {_.toInt})("integer")

  def date : Parser[DateTime] = log((repsep(integer, space)  filter (_.length == 6))  ^^ (l =>
      new DateTime(l(0), l(1), l(2), l(3), l(4), l(5), 0)
    ))("date")

  def event : Parser[Event] = "Event" ~> integer ~ field ~ date ~ stuff ^^ {
    case number~typ~when~stuff => Event(number, typ, when, stuff)}

  def reset : Parser[Reset] = "RSET" ~> field ~ stuff ^^ { case node~stuff =>
    Reset(node, stuff)
  }

  def other : Parser[Other] = ("M3UA_IP_LINK" | "OPC") ~ stuff ^^ { case typ~stuff =>
    Other(typ, stuff)
  }

  def line : Parser[LineContent] = event | reset | other
  def lines = repsep(line, lineEnd)

  def parseLines(s : String) = parseAll(lines, s)
}

LineContentParser.parseLines(input)
trying integer at scala.util.parsing.input.CharSequenceReader@108589b
integer --> [1.13] parsed: 5003
trying date at scala.util.parsing.input.CharSequenceReader@cec2e3
trying integer at scala.util.parsing.input.CharSequenceReader@cec2e3
integer --> [1.30] parsed: 2013
trying integer at scala.util.parsing.input.CharSequenceReader@14da3
integer --> [1.33] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1902929
integer --> [1.36] parsed: 6
trying integer at scala.util.parsing.input.CharSequenceReader@17e4dce
integer --> [1.39] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1747fd8
integer --> [1.42] parsed: 37
trying integer at scala.util.parsing.input.CharSequenceReader@1757f47
integer --> [1.45] parsed: 55
date --> [1.45] parsed: 2013-12-06T12:37:55.000+01:00
解析器组合符中的模式是自解释的。我总是尽可能早地将每个成功解析的块转换为部分结果。然后将部分结果合并为最终结果

调试提示:您始终可以添加
日志
解析器。它将在应用规则之前和之后打印。与给定名称(例如“日期”)一起,它还将打印输入源的当前位置、应用规则的位置以及解析的部分结果(如果适用)

示例输出如下所示:

import org.joda.time.DateTime
import scala.util.parsing.combinator.JavaTokenParsers

val input =
  """Event:005003:information:2013 12 06 12 37 55:n3.swmml20861:1:Full client swmml20861 registered [entry=280 PID=20864 queue=0x4ca9001b]
    |RSET:m3node:AUTRS:1-1-24:A:0:LOADSHARE:INHIBITED:0
    |M3UA_IP_LINK:m3node:AUT001LKSET1:AUT001LK1:r
    |OPC:m3node:1-10-2(P):A7:NAT0""".stripMargin

trait LineContent
case class Event(number : Int, typ : String, when : DateTime, stuff : List[String]) extends LineContent
case class Reset(node : String, stuff : List[String]) extends LineContent
case class Other(typ : String, stuff : List[String]) extends LineContent

object LineContentParser extends JavaTokenParsers {
  override val whiteSpace=""":""".r

  val space="""\s+""".r
  val lineEnd = """"\n""".r  //"""\s*(\r?\n\r?)+""".r
  val field = """[^:]*""".r

  def stuff : Parser[List[String]] = rep(field)
  def integer : Parser[Int] = log(wholeNumber ^^ {_.toInt})("integer")

  def date : Parser[DateTime] = log((repsep(integer, space)  filter (_.length == 6))  ^^ (l =>
      new DateTime(l(0), l(1), l(2), l(3), l(4), l(5), 0)
    ))("date")

  def event : Parser[Event] = "Event" ~> integer ~ field ~ date ~ stuff ^^ {
    case number~typ~when~stuff => Event(number, typ, when, stuff)}

  def reset : Parser[Reset] = "RSET" ~> field ~ stuff ^^ { case node~stuff =>
    Reset(node, stuff)
  }

  def other : Parser[Other] = ("M3UA_IP_LINK" | "OPC") ~ stuff ^^ { case typ~stuff =>
    Other(typ, stuff)
  }

  def line : Parser[LineContent] = event | reset | other
  def lines = repsep(line, lineEnd)

  def parseLines(s : String) = parseAll(lines, s)
}

LineContentParser.parseLines(input)
trying integer at scala.util.parsing.input.CharSequenceReader@108589b
integer --> [1.13] parsed: 5003
trying date at scala.util.parsing.input.CharSequenceReader@cec2e3
trying integer at scala.util.parsing.input.CharSequenceReader@cec2e3
integer --> [1.30] parsed: 2013
trying integer at scala.util.parsing.input.CharSequenceReader@14da3
integer --> [1.33] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1902929
integer --> [1.36] parsed: 6
trying integer at scala.util.parsing.input.CharSequenceReader@17e4dce
integer --> [1.39] parsed: 12
trying integer at scala.util.parsing.input.CharSequenceReader@1747fd8
integer --> [1.42] parsed: 37
trying integer at scala.util.parsing.input.CharSequenceReader@1757f47
integer --> [1.45] parsed: 55
date --> [1.45] parsed: 2013-12-06T12:37:55.000+01:00

我认为这是一种将输入解析为类型良好的Scala对象的简单且可维护的方法。这一切都在核心Scala API中,因此我称之为“惯用”。在Idea Scala工作表中键入示例代码时,完成信息和类型信息非常有效。因此,IDE似乎很好地支持这种方法。

我非常喜欢您的解决方案,但如果我理解正确,它会锁定到特定的案例类大小,不是吗?你有没有发现一种更通用的方法?不知道具体的案例类大小是什么意思。使用
case-class-Bar(a:Int,b:Int,c:String)
您可以使用
Reader[Int::Int::String::HNil].map(Generic[Bar].from.
。但是,这更像是一个草图,而不是一个完整的解决方案。例如,它根本不会处理变量类型。正如我所说,我不太熟悉Shapeble,这可能不是最好的办法。另外,Shapeless 2.0的新版本刚刚推出,可能会让一切变得更简单。解析之后,我得到了一个字符串列表,但我试图将其转换为提供的case类,并为用户提供尽可能少的样板代码。对于您在上面的消息中公开的当前解决方案(这是我发现的最好的解决方案),lib的用户需要编写一些代码来解释如何将字符串强制转换为其强制转换类,正如您在上一个命令中所解释的那样。如果可能的话,我想听听关于如何做的建议。不支持变体类型在我的情况下不会是一个问题。我不确定我是否理解你在寻找什么。也许你应该问一个新的问题,因为这样你会得到比我更好的答案:)我听从了你的建议:-)