Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
spark提交scala软件包++;运算符返回java.lang.NoSuchMethodError:scala.Predef$.refArrayOps_Scala_Apache Spark_Sbt - Fatal编程技术网

spark提交scala软件包++;运算符返回java.lang.NoSuchMethodError:scala.Predef$.refArrayOps

spark提交scala软件包++;运算符返回java.lang.NoSuchMethodError:scala.Predef$.refArrayOps,scala,apache-spark,sbt,Scala,Apache Spark,Sbt,在尝试使用spark submit运行scala spark应用程序时,我遇到了一个奇怪的问题(在执行sbt run时,它工作正常)。所有这些都在本地运行 我有一份标准的sparkSession声明: val spark: SparkSession = SparkSession .builder() .master("local[*]") .appName("EPGSubtitleTimeSeries") .getOrCreate() 但在尝试通过spark

在尝试使用spark submit运行scala spark应用程序时,我遇到了一个奇怪的问题(在执行
sbt run
时,它工作正常)。所有这些都在本地运行

我有一份标准的sparkSession声明:

  val spark: SparkSession = SparkSession
    .builder()
    .master("local[*]")
    .appName("EPGSubtitleTimeSeries")
    .getOrCreate()
但在尝试通过spark运行时,请按以下方式提交:

./bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.3 --master local[2] --class com.package.EPGSubtitleTimeSeries --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem /home/jay/project/tv-data-pipeline/target/scala-2.12/epg-subtitles_2.12-0.1.jar

我得到了这个错误:

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)[Ljava/lang/Object;
    at com.project.Environment$.<init>(EPGSubtitleTimeSeries.scala:55)
    at com.project.Environment$.<clinit>(EPGSubtitleTimeSeries.scala)
    at com.project.EPGSubtitleJoined$.$anonfun$start_incremental_load$1(EPGSubtitleTimeSeries.scala:409)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    at scala.collection.immutable.Set$Set3.foreach(Set.scala:163)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:47)
    at scala.collection.SetLike$class.map(SetLike.scala:92)
    at scala.collection.AbstractSet.map(Set.scala:47)
    at com.package.EPGSubtitleJoined$.start_incremental_load(EPGSubtitleTimeSeries.scala:408)
    at com.package.EPGSubtitleTimeSeries$.main(EPGSubtitleTimeSeries.scala:506)
    at com.package.EPGSubtitleTimeSeries.main(EPGSubtitleTimeSeries.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我正在使用spark 2.4.4和scala 2.12.8以及joda time 2.10.1(我的build.sbt没有其他依赖项)


有人知道错误是什么吗?

在我与Luis的对话之后,我似乎是在scala 2.11上运行spark时用scala 2.12编译的

我首先想升级到spark 2.4.4(我想这将允许我使用2.12?),但主要问题是aws emr(这是我的最终目标)不支持scala 2.12:

因此,最终的解决方案是在编译时将scala版本降级到2.11


非常感谢路易斯的指导和知识

您确定在编译和运行时使用的Spark&Scala版本相同吗?我是通过命令行完成所有这些工作的,如何确保这一点?集群是如何创建的?如果是内部部署,请询问您的系统管理员他们使用哪些版本。如果在AWS EMR上,请检查服务版本并查看其提供的软件包版本等文档。此外,如果您可以访问应用程序运行的群集,请打开
spark shell
它将打印spark&Scala版本。您应该始终使用相同的精确版本。@LuisMiguelMejíaSuárez啊,我应该精确,在AWS上运行它之前,我正在尝试在本地运行它,仍然使用spark Submit,并且您确定您在本地安装的版本与您用于编译的版本相同吗?
val EPG_SCHEDULE_OUTPUT_COLUMNS = Array(
    "program_title",
    "epg_titles",
    "series_title",
    "season_title",
    "date_time",
    "duration",
    "short",
    "medium",
    "long",
    "start_timestamp",
    "end_timestamp",
    "epg_year_month",
    "epg_day_of_month",
    "epg_hour_of_day",
    "epg_genre",
    "channelId"
  )

  val EPG_OUTPUT_COLUMNS: Array[String] = EPG_SCHEDULE_OUTPUT_COLUMNS ++ Array("subtitle_channel_title", "epg_channel_title", "channelTitle")