Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/symfony/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 在ApacheSpark中运行***pyspark--conf spark.jars.packages=com.amazon.deequ:deequ:1.0.2***时出错_Java_Apache Spark_Pyspark - Fatal编程技术网

Java 在ApacheSpark中运行***pyspark--conf spark.jars.packages=com.amazon.deequ:deequ:1.0.2***时出错

Java 在ApacheSpark中运行***pyspark--conf spark.jars.packages=com.amazon.deequ:deequ:1.0.2***时出错,java,apache-spark,pyspark,Java,Apache Spark,Pyspark,我已经在ubuntu:16.04及其所有依赖项上安装了ApacheSpark2.4.4版本。安装后,我运行下面提到的pyspark命令 pyspark--conf spark.jars.packages=com.amazon.deequ:deequ:1.0.2 运行此命令后,我在dir-/home/username/.ivy2中发现一个文件丢失错误/ java.io.FileNotFoundException: File file:/home/streamflux/.ivy2/jars/net.

我已经在ubuntu:16.04及其所有依赖项上安装了ApacheSpark2.4.4版本。安装后,我运行下面提到的pyspark命令

pyspark--conf spark.jars.packages=com.amazon.deequ:deequ:1.0.2

运行此命令后,我在dir-/home/username/.ivy2中发现一个文件丢失错误/

java.io.FileNotFoundException: File file:/home/streamflux/.ivy2/jars/net.sourceforge.f2j_arpack_combined_all-0.1.jar does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1544)
        at org.apache.spark.SparkContext.addFile(SparkContext.scala:1508)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:462)
        at org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:462)
        at scala.collection.immutable.List.foreach(List.scala:392)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:462)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)
20/05/23 09:43:14 INFO server.AbstractConnector: Stopped Spark@bb24d4c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
20/05/23 09:43:14 INFO ui.SparkUI: Stopped Spark web UI at http://0babaa1d999c:4040
20/05/23 09:43:14 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/05/23 09:43:14 INFO memory.MemoryStore: MemoryStore cleared
20/05/23 09:43:14 INFO storage.BlockManager: BlockManager stopped
20/05/23 09:43:14 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
20/05/23 09:43:14 WARN metrics.MetricsSystem: Stopping a MetricsSystem that is not running
20/05/23 09:43:14 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/05/23 09:43:14 INFO spark.SparkContext: Successfully stopped SparkContext 

java.io.FileNotFoundException:File文件:/home/streamflux/.ivy2/jars/net.sourceforge.f2j\u arpack\u combined\u all-0.1.jar不存在
位于org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
位于org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
位于org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
位于org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
位于org.apache.spark.SparkContext.addFile(SparkContext.scala:1544)
位于org.apache.spark.SparkContext.addFile(SparkContext.scala:1508)
位于org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:462)
位于org.apache.spark.SparkContext$$anonfun$13.apply(SparkContext.scala:462)
位于scala.collection.immutable.List.foreach(List.scala:392)
位于org.apache.spark.SparkContext(SparkContext.scala:462)
位于org.apache.spark.api.java.JavaSparkContext(JavaSparkContext.scala:58)
位于sun.reflect.NativeConstructorAccessorImpl.newInstance0(本机方法)
位于sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
在sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
位于java.lang.reflect.Constructor.newInstance(Constructor.java:423)
位于py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
位于py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
在py4j.Gateway.invoke处(Gateway.java:238)
位于py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
在py4j.commands.ConstructorCommand.execute处(ConstructorCommand.java:69)
在py4j.GatewayConnection.run处(GatewayConnection.java:238)
运行(Thread.java:748)
20/05/23 09:43:14信息服务器。抽象连接器:已停止Spark@bb24d4c{HTTP/1.1[HTTP/1.1]}{0.0.0.0:4040}
20/05/23 09:43:14信息用户界面。SparkUI:已在停止Spark web用户界面http://0babaa1d999c:4040
20/05/23 09:43:14信息spark.MapOutputRackerMasterEndpoint:MapOutputRackerMasterEndpoint已停止!
20/05/23 09:43:14信息内存。内存存储:内存存储已清除
20/05/23 09:43:14信息存储。BlockManager:BlockManager已停止
20/05/23 09:43:14信息存储。BlockManagerMaster:BlockManagerMaster已停止
20/05/23 09:43:14警告metrics.MetricsSystem:停止未运行的MetricsSystem
20/05/23 09:43:14信息计划程序。OutputCommitCoordinator$OutputCommitCoordinator返回点:OutputCommitCoordinator已停止!
20/05/23 09:43:14信息spark.SparkContext:已成功停止SparkContext
我尝试删除缓存。我注意到有一个文件的名称与之类似-
net.sourceforge.f2j\u arpack\u combined\u all-0.1-javadoc.jar


请帮助我解决此错误

在测试作业时遇到此错误

有一个例子提到类似的调用会发生这种情况,并表明依赖项解析受到影响

似乎没有修复,因为解决时原始版本被视为EOL。
然而,就我所见,它仍然出现在2.4.4/2.4.5上(我们在2.4.5上)

在我们的例子中,由于我们通常通过
sbt
将依赖项直接添加到docker构建中,所以所有JAR都可用并在本地加载

我们只是显式地添加了受影响的包

libraryDependencies+=“com.github.fommil.netlib”%全部“%1.1.2”仅限POM()

您可以使用类似的方法,也可以将任何相关的JAR直接添加到
$SPARK\u HOME/jars
或常春藤文件夹中,这样应该可以正常工作

我在使用
com.amazon.deequ:deequ:1.0.4
运行spark submit时遇到了类似的问题。如果不需要,可以通过排除包
net.sourceforge.f2j\u arpack\u combined\u all-0.1.jar
来解决这个问题。您可以使用以下代码:

val spark = SparkSession.builder
      .appName("deequ-test")
      .config("spark.jars.packages", "com.amazon.deequ:deequ:1.0.4")
      .config("spark.jars.excludes", "net.sourceforge.f2j:arpack_combined_all")
      .getOrCreate()

python版本??python 2.7.12 python 3.5。2@Srinivas你能帮忙吗?