Apache spark 如何在Spark Cluster模式下运行此代码_Apache Spark_Cluster Computing

Apache spark 如何在Spark Cluster模式下运行此代码

apache-spark cluster-computing

Apache spark 如何在Spark Cluster模式下运行此代码,apache-spark,cluster-computing,Apache Spark,Cluster Computing,我想在群集上运行我的代码：我的代码： import java.util.Properties 导入edu.stanford.nlp.ling.core注释_ 导入edu.stanford.nlp.pipeline_ 导入org.apache.spark.{SparkConf，SparkContext} 导入scala.collection.JavaConversions_ 导入scala.collection.mutable.ArrayBuffer 对象Pre2{ def plainTextT

我想在群集上运行我的代码：我的代码：

import java.util.Properties
导入edu.stanford.nlp.ling.core注释_
导入edu.stanford.nlp.pipeline_
导入org.apache.spark.{SparkConf，SparkContext}
导入scala.collection.JavaConversions_
导入scala.collection.mutable.ArrayBuffer
对象Pre2{
def plainTextToLemmas（文本：字符串，管道：StanfordCoreNLP）：Seq[String]={
val doc=新注释（文本）
管道注释（doc）
val引理=新数组缓冲[String]（）
val语句=doc.get（classOf[SentencesAnnotation]）
判决{
val props=新属性（）
道具放置（“注释器”、“标记化、ssplit、pos、引理”）
val管道=新StanfordCoreNLP（道具）
p、 映射（q=>plainTextToLemmas（q，管道））
})
val lemmatized1=lemmatized.map（l=>l.head+l.tail.mkString（“”）
val lemmatized2=lemmatized1.filter（u.nonEmpty）
lemmatized2.coalesce（1.saveAsTextFile（“data/out.txt”）
}
}

和群集功能：

2节点

每个节点有：60g RAM

每个节点有：48个核

共享磁盘

我在这个集群上安装了Spark，其中一个节点是主节点和辅助节点，另一个节点是辅助节点

在终端中使用此命令运行代码时：

/bin/spark提交-主spark://192.168.1.20:7077 --班长 --部署模式集群代码/Pre2.jar

它表明：

15/08/19 15:27:21警告重新提交客户端：无法连接到服务器spark://192.168.1.20:7077. 警告：主端点 spark://192.168.1.20:7077 不是REST服务器。正在退回到改为传统提交网关。15/08/19 15:27:22警告 NativeCodeLoader:无法为您的应用程序加载本机hadoop库平台…在适用的情况下使用内置java类驱动程序已成功提交为驱动程序-20150819152724-0002…正在等待在为驱动程序状态轮询主机之前…为驱动程序轮询主机驱动程序-20150819152724-0002的状态正在运行，驱动程序正在运行 1192.168.1.19:33485（工人-20150819115013-192.168.1.19-33485）

如何在Spark standalone cluster上运行上述代码？

请确保使用

端口检查WebUI。在您的示例中，它将是

192.168.1.20:8080

如果您在Spark独立群集模式下运行它，请在不使用

--部署模式群集的情况下进行尝试，并通过添加--执行器内存60g
“警告：主端点”对节点内存进行硬编码spark://192.168.1.20:7077 不是REST服务器“
从错误中可以看出，主rest url似乎有所不同。
可以在master_URL:8080 UI上找到rest URL
您的消息说正在运行
，它似乎正在正确运行。它没有返回任何内容。在UI模式下，状态为failed…UI是否提供了有关失败原因的更多详细信息？不，没有更多详细信息。您正在声明--class Main
，但似乎没有名为Main
的类，您也在硬编码master
成为local
import java.util.Properties

import edu.stanford.nlp.ling.CoreAnnotations._
import edu.stanford.nlp.pipeline._
import org.apache.spark.{SparkConf, SparkContext}

import scala.collection.JavaConversions._
import scala.collection.mutable.ArrayBuffer

object Pre2 {

  def plainTextToLemmas(text: String, pipeline: StanfordCoreNLP): Seq[String] = {
    val doc = new Annotation(text)
    pipeline.annotate(doc)
    val lemmas = new ArrayBuffer[String]()
    val sentences = doc.get(classOf[SentencesAnnotation])
    for (sentence <- sentences; token <- sentence.get(classOf[TokensAnnotation])) {
      val lemma = token.get(classOf[LemmaAnnotation])
      if (lemma.length > 0 ) {
        lemmas += lemma.toLowerCase
      }
    }
    lemmas
  }
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()
      .setMaster("local")
      .setAppName("pre2")

    val sc = new SparkContext(conf)
      val plainText = sc.textFile("data/in.txt")
      val lemmatized = plainText.mapPartitions(p => {
        val props = new Properties()
        props.put("annotators", "tokenize, ssplit, pos, lemma")
        val pipeline = new StanfordCoreNLP(props)
        p.map(q => plainTextToLemmas(q, pipeline))
      })
      val lemmatized1 = lemmatized.map(l => l.head + l.tail.mkString(" "))
      val lemmatized2 = lemmatized1.filter(_.nonEmpty)
      lemmatized2.coalesce(1).saveAsTextFile("data/out.txt)
  }
}