Apache spark 关于Spark应用程序体系结构：长寿命服务器_Apache Spark

Apache spark 关于Spark应用程序体系结构：长寿命服务器

apache-spark

Apache spark 关于Spark应用程序体系结构：长寿命服务器,apache-spark,Apache Spark,在Cloudera的这篇文章中，他们说：应用程序可以用于单个批处理作业、具有多个间隔作业的交互式会话，或者用于持续满足请求的长寿命服务器我对Long Live Server持续满足请求感兴趣：如何配置Spark以在该模式下工作？我已经编写了一个非常简单的应用程序，它在套接字端口中侦听，并在收到订单时执行任务，但我不确定这是否是必须工作的方式。有什么建议、帖子或书能给我的人生道路带来光明吗谢谢大家! 我的代码非常简单和幼稚，但如下所示： // Before this line is the

在Cloudera的这篇文章中，他们说：

应用程序可以用于单个批处理作业、具有多个间隔作业的交互式会话，或者用于持续满足请求的长寿命服务器

我对Long Live Server持续满足请求感兴趣：如何配置Spark以在该模式下工作？我已经编写了一个非常简单的应用程序，它在套接字端口中侦听，并在收到订单时执行任务，但我不确定这是否是必须工作的方式。有什么建议、帖子或书能给我的人生道路带来光明吗谢谢大家!

我的代码非常简单和幼稚，但如下所示：

// Before this line is the code in charge of reading the source files and creates the graph
val server = new ServerSocket(9999)
val s = server.accept()
val in = new BufferedSource(s.getInputStream()).getLines()
val out = new PrintStream(s.getOutputStream())

while (true) {
  var str = in.next()
  if (str =="filtro"){
    out.println("Starting Job. Please Wait")
    var a = in.next()
    graph.vertices.filter{
      case(id, (followers_count, lang)) =>  followers_count > 10000
    }.collect.foreach{
      case(id, (followers_count,lang)) => out.println(s"$screen_name has $followers_count")
    }
    out.println("Job Finished")
    out.flush()
  }
  if (str == "filtro2") {
    out.println("Starting Job. Please Wait")
    var a = in.next()
    graph.vertices.filter{
      case(id, (followers_count, lang)) =>  lang == "es"
    }.collect.foreach{
      case(id, (followers_count, lang)) => out.println(s"$screen_name has $followers_count")
    }
    out.println("Job Finished")
    out.flush()
  }
  out.println(in.next())
  out.flush()
 }
s.close()

如您所见，我的原型Scala脚本正在侦听，当它收到预期的订单时，订单就会执行。我很确定这必须以另一种方式完成，但我找不到如何完成。

听起来您已经在非常简单的socket listener应用程序中实现了它，尽管在没有看到任何代码的情况下很难确定

一般来说，只要您的SparkContext还在，与它相关联的任何RDD都可能还在，因此如果您将它们持久化，它们将可供进一步使用

以后的任务可以利用持久化的RDD来避免重做一些工作。

有没有可能链接到简单应用程序的git要点？我想这就是我真正需要做的事情的答案：谢谢你，DPM，我已经编辑了我的问题以添加代码。