Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/heroku/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 在executor pod上获取java.lang.UnsupportedOperationException_Apache Spark_Apache Spark Sql - Fatal编程技术网

Apache spark 在executor pod上获取java.lang.UnsupportedOperationException

Apache spark 在executor pod上获取java.lang.UnsupportedOperationException,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我正在使用pyspark运行一个python脚本,该脚本连接到Kubernetes集群,以使用executor pod运行作业。该脚本的思想是创建一个查询雪花数据库的SQLContext。然而,我得到了以下异常,但是这个异常描述得不够 20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.UnsupportedOperationException: sun.m

我正在使用pyspark运行一个python脚本,该脚本连接到Kubernetes集群,以使用executor pod运行作业。该脚本的思想是创建一个查询雪花数据库的SQLContext。然而,我得到了以下异常,但是这个异常描述得不够

20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
    at net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
    at net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
    at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
    at net.snowflake.client.jdbc.ArrowResultChunk.readArrowStream(ArrowResultChunk.java:117)
    at net.snowflake.client.core.SFArrowResultSet.buildFirstChunk(SFArrowResultSet.java:352)
    at net.snowflake.client.core.SFArrowResultSet.<init>(SFArrowResultSet.java:230)
    at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
    at net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
    at net.snowflake.spark.snowflake.io.ResultIterator.<init>(SnowflakeResultSetRDD.scala:78)
    at net.snowflake.spark.snowflake.io.SnowflakeResultSetRDD.compute(SnowflakeResultSetRDD.scala:41)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.base/java.lang.Thread.run(Unknown Source)
20/07/15 12:10:39错误执行器。执行器:任务0.0在阶段0.0中出现异常(TID 0)
java.lang.UnsupportedOperationException:sun.misc.Unsafe或java.nio.DirectByteBuffer.(长,int)不可用
位于net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
位于net.snowflake.client.jdbc.arrowsultchunk.readArrowStream(arrowsultchunk.java:117)
位于net.snowflake.client.core.sfarrowsresultset.buildFirstChunk(sfarrowsresultset.java:352)
位于net.snowflake.client.core.sfarrowsresultset.(sfarrowsresultset.java:230)
位于net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
在net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
at net.snowflake.spark.snowflake.io.ResultIterator.(SnowflakeResultSetRDD.scala:78)
at net.snowflake.spark.snowflake.io.snowflakeResultsTrdd.compute(snowflakeResultsTrdd.scala:41)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
位于org.apache.spark.scheduler.Task.run(Task.scala:127)
位于org.apache.spark.executor.executor$TaskRunner.$anonfun$run$3(executor.scala:464)
位于org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:467)
位于java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(未知源)
位于java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(未知源)
位于java.base/java.lang.Thread.run(未知源)

有人遇到过类似的情况吗?如果是,您是如何修复的?

注意:以下更改是在本地系统中完成的它可能对您有效,也可能不适用

发布我如何解决问题的步骤

我的
brew
安装spark时也遇到类似问题openjdk@11默认情况下&为了解决这个问题,我将java版本从
openjdk@11
oracle jdk 1.8
(您可以使用OpenJDK 1.8代替oracle jdk 1.8)

已将java版本从更改为openjdk@11到OracleJDK1.8。现在,我的
spark submit
命令如下所示

> cat spark-submit
#!/bin/bash
JAVA_HOME="/usr/share/jdk1.8.0_202" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark"  "$@"
另一个解决方法要解决此问题,请尝试在
spark submit的conf下设置

spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...

我遇到了同样的问题,并且能够解决它。我发现在Java>=9中,
io.netty.tryReflectionSetAccessible
需要显式设置为
true
,Spark Snowflake connector才能读取Kubernetes executor pods中从Snowflake返回的数据

现在,由于
io.netty
包在snowflake jdbc中有阴影,我们需要用完整的包名限定属性,即
net.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true

此属性需要设置为Spark executor pod的JVM选项。这可以通过设置executor JVM选项或executor extra JVM Option Spark属性来实现。例如:

属性名称:
spark.executor.extraJavaOptions

Value:
-Dnet.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true

在类路径中似乎有一些不可比拟的库??嘿@Srinivas,有趣的理论,我喜欢它,你知道要使用哪个库吗?你在这两台机器上都使用java 11(在我运行pyspak的地方,在executor pod中)是oracle jdk还是open jdk??
spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...