Scala 如何调试Akka流管道?

Scala 如何调试Akka流管道?,scala,akka,typesafe,akka-stream,Scala,Akka,Typesafe,Akka Stream,我正在尝试为日志文件构建一个处理管道,其中日志行看起来像 2005-05-06 14:58:57 1 45.23.4.218 304 TCP_HIT 542 1109 GET http sports.espn.go.com /crossdomain.xml - - DIRECT 199.181.132.141 text/xml "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" PROXIED Sport

我正在尝试为日志文件构建一个处理管道,其中日志行看起来像

2005-05-06 14:58:57 1 45.23.4.218 304 TCP_HIT 542 1109 GET http sports.espn.go.com /crossdomain.xml - - DIRECT 199.181.132.141 text/xml "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" PROXIED Sports/Recreation/Hobbies - 192.16.170.44 SG-HTTP-Service - none -
对于这个特定的日志文件,我有大约要处理的193705日志行

Harits-MacBook-Pro-2:bluecoat_proxy_big harit$ wc -l Demo_log_004.log 
  195765 Demo_log_004.log
Harits-MacBook-Pro-2:bluecoat_proxy_big harit$ grep -E "GET|POST|CONNECT" Demo_log_004.log | wc -l
  192197
Harits-MacBook-Pro-2:bluecoat_proxy_big harit$ wc -l Demo_log_004.log 
  195765 Demo_log_004.log
Harits-MacBook-Pro-2:bluecoat_proxy_big harit$ grep -v "^#" Demo_log_004.log | wc -l
  193705
Harits-MacBook-Pro-2:bluecoat_proxy_big harit$ 
我创建了一个流程图,最初看起来像

  source ~> byteStringToString ~> filterComments ~> splitLogLine ~> broadcast ~> transformEvent ~> sinkEvents
                                                                    broadcast ~> obfuscateIpAddress ~> sinkAssets
但后来我意识到我在水槽里得到的要少得多

$ java -cp processor/target/lib:processor/target/processor-1.0-SNAPSHOT.jar com.learner.processor.LogFile | tee out.log

$ wc -l out.log 
   15070 out.log
所以我把我的图线性化,以确保所有的线都通过管道。我当前的代码看起来像

import java.io.File

import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.io.Implicits.AddSynchronousFileSource
import akka.stream.scaladsl.FlowGraph.Builder
import akka.stream.scaladsl._
import akka.util.ByteString
import com.learner.messages.BlueCoatEvent
import com.learner.processor.Flows.byteStringToString
import com.typesafe.scalalogging.Logger
import org.slf4j.LoggerFactory

import scala.concurrent.Future
import scala.util.hashing.MurmurHash3.stringHash

object LogFile {
  val maxBytesPerLine = 1500
  implicit val system = ActorSystem("system")

  def apply(file: File) = new LogFile(file)

  def main(args: Array[String]) {
    LogFile(new File("/Users/harit/Downloads/bluecoat_proxy_big/Demo_log_004.log")).processGraph()
  }
}

class LogFile(file: File)(implicit val system: ActorSystem) {
  Predef.assert(file.exists(), "log file must exists")

  implicit val materializer = ActorMaterializer()
  val logger = Logger(LoggerFactory.getLogger(getClass))

  val source: Source[ByteString, Future[Long]] = Source.synchronousFile(file)


  def processGraph() = {
    val sinkEvents = Sink.foreach(println)
    val sinkAssets = Sink.ignore

    val filterComments = Flow[String].filter(!_.startsWith("#"))
    val splitLogLine = Flow[String].map(_.split("\\s") toList)
    val transformEvent = Flow[List[String]].map(tokens => BlueCoatEvent(tokens))
    val obfuscateIpAddress = Flow[List[String]].map(tokens => Map[String, String](tokens(3) -> stringHash(tokens(3)).toString))


    FlowGraph.closed() { implicit builder: Builder[Unit] =>
      import FlowGraph.Implicits._

      source ~> byteStringToString ~> sinkEvents
    }.run()

  }
}

当我再次运行我的程序时,它再次生成接近上述数字的行(不完全相同)

我很困惑,想在这里得到一些帮助

  • 流不会失败或抛出异常,但仍会生成 更少的输出行,如何调试
  • 最大帧长为1500,但是日志行可以大于该值(更多字符)。这会成为一个问题吗?那么我该如何解决这样的问题呢
  • 我如何确认?根据我的密码,我没有拿回
    Materializer
    调用
    run()
    时,无法关闭
    ActorSystem
    ,我缺少什么
  • 更新

    我刚刚发现,对于包含
    2564个字符(字节)

    因此,它在
    行15071
    处停止,这是
    2564字节


    但为什么它不抛出异常呢?如何处理此问题?

    首先,我将等待接收器返回的未来,直到它完成,然后再终止应用程序。请参见此处:关于如何将流具体化为接收器的值(您需要将其传递给图形生成器),理想情况下,我不会终止应用程序,因为它将永远运行并处理日志文件,但正如我所看到的,
    IntersperseStage
    应该发送
    元素,元素之间用
    \n
    分隔,对吗?
    IntersperseStage
    -是的,它在每两个元素之间插入给定的元素。虽然我不确定这和你的问题有什么关系:)不,不,不是;-)。我正在学习
    Scala
    Akka
    ,因此阅读代码以确保我理解它这里有一个
    recover
    选项,可能对
    processGraph()的输出有用。下面是一个参考它的SO:,以及实际的文档:
    
      val byteStringToString: Flow[ByteString, String, Unit] = Flow[ByteString]
        .via(Framing.delimiter(ByteString(System.lineSeparator), maximumFrameLength = LogFile.maxBytesPerLine, allowTruncation = true))
        .map(_.utf8String)
    
    15070 2005-05-06 14:58:57 42 45.23.4.218 200 TCP_NC_MISS 903 1098 GET http sports.espn.go.com /nba/xml/upcomingTV ?sport=nba - DIRECT sports.espn.go.com text/html;%20charset=iso-       8859-1 "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" PROXIED Sports/Recreation/Hobbies - 192.16.170.44 SG-HTTP-Service - none -
    15071 2005-05-06 14:58:57 306 45.23.4.218 200 TCP_MISS 16641 2140 GET http m3.doubleclick.net /872526/match2fb_728x90v2.swf ?clickTag=http%253A//ad.doubleclick.net/click%25253Bh%       253Dv3%257C3270%257Cf%257C6e%257C%25252a%257Ct%25253B14910813%25253B0-0%25253B0%25253B11166676%25253B3454-728%257C90%25253B9369149%257C9387045%257C1%25253B%25253B%25257Essc       s%25253D%25253fhttp%253A//log.go.com/log%253Fsrvc%25253dsz%252526guid%25253d5504B8AD-0FC2-4475-9203-4CE6D2125953%252526drop%25253d0%252526addata%25253d0%253A63%253A188329%2       53A65%252526a%25253d1%252526goto%25253dhttp%25253a%25252f%25252fwww.levitra.com/match/levitra_promotions/match/get/forms.jsp%25253Frotation%25253D11166676%252526banner%2525       3D14910813&clickTag1=http%253A//log.go.com/log%253Fsrvc%25253dsz%252526guid%25253d5504B8AD-0FC2-4475-9203-4CE6D2125953%252526drop%25253d0%252526addata%25253d0%253A63%253A18       8329%253A65%252526a%25253d1%252526goto%25253dhttp%253A//ad.doubleclick.net/click%25253Bh%253Dv3%257C3270%257Cf%257C6e%257C%25252a%257Ct%25253B14910813%25253B0-0%25253B0%252       53B11166676%25253B3454-728%257C90%25253B9369149%257C9387045%257C1%25253B%25253B%25257Esscs%25253D%25253fhttp%253A//log.go.com/log%253Fsrvc%25253dsz%252526guid%25253d5504B8A       D-0FC2-4475-9203-4CE6D2125953%252526drop%25253d0%252526addata%25253d0%253A63%253A188329%253A65%252526a%25253d1%252526goto%25253dhttp%25253a%25252f%25252fwww.levitra.com/mat       ch/levitra_promotions/match/get/forms.jsp%25253Frotation%25253D11166676%252526banner%25253D14910813&clickTag2=http%253A//log.go.com/log%253Fsrvc%25253dsz%252526guid%25253d5       504B8AD-0FC2-4475-9203-4CE6D2125953%252526drop%25253d0%252526addata%25253d0%253A63%253A188329%253A65%252526a%25253d1%252526goto%25253dhttp%253A//ad.doubleclick.net/click%25       253Bh%253Dv3%257C3270%257Cf%257C6e%257C%25252a%257Ct%25253B14910813%25253B0-0%25253B0%25253B11166676%25253B3454-728%257C90%25253B9369149%257C9387045%257C1%25253B%25253B%252       57Esscs%25253D%25253fhttp%253A//log.go.com/log%253Fsrvc%25253dsz%252526guid%25253d5504B8AD-0FC2-4475-9203-4CE6D2125953%252526drop%25253d0%252526addata%25253d0%253A63%253A18       8329%253A65%252526a%25253d1%252526goto%25253dhttp%253A//www.levitra.com/consumer/about_levitra/levitra_side_effects.htm%253Frotation%253D11166676%2526banner%253D14910813&cl       ickTag3=&clickTag4=&clickTag5= - DIRECT m3.doubleclick.net application/x-shockwave-flash "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322)" PROXIED Web       %20Advertisements - 192.16.170.44 SG-HTTP-Service - none -
    
    $ wc -l out.log 
       15070 out.log