Apache spark 在executor pod上获取java.lang.UnsupportedOperationException
我正在使用pyspark运行一个python脚本,该脚本连接到Kubernetes集群,以使用executor pod运行作业。该脚本的思想是创建一个查询雪花数据库的SQLContext。然而,我得到了以下异常,但是这个异常描述得不够Apache spark 在executor pod上获取java.lang.UnsupportedOperationException,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我正在使用pyspark运行一个python脚本,该脚本连接到Kubernetes集群,以使用executor pod运行作业。该脚本的思想是创建一个查询雪花数据库的SQLContext。然而,我得到了以下异常,但是这个异常描述得不够 20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.UnsupportedOperationException: sun.m
20/07/15 12:10:39 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
at net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
at net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
at net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
at net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
at net.snowflake.client.jdbc.ArrowResultChunk.readArrowStream(ArrowResultChunk.java:117)
at net.snowflake.client.core.SFArrowResultSet.buildFirstChunk(SFArrowResultSet.java:352)
at net.snowflake.client.core.SFArrowResultSet.<init>(SFArrowResultSet.java:230)
at net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
at net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
at net.snowflake.spark.snowflake.io.ResultIterator.<init>(SnowflakeResultSetRDD.scala:78)
at net.snowflake.spark.snowflake.io.SnowflakeResultSetRDD.compute(SnowflakeResultSetRDD.scala:41)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:464)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:467)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
20/07/15 12:10:39错误执行器。执行器:任务0.0在阶段0.0中出现异常(TID 0)
java.lang.UnsupportedOperationException:sun.misc.Unsafe或java.nio.DirectByteBuffer.(长,int)不可用
位于net.snowflake.client.jdbc.internal.io.netty.util.internal.PlatformDependent.directBuffer(PlatformDependent.java:399)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.getDirectBuffer(NettyArrowBuf.java:243)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.NettyArrowBuf.nioBuffer(NettyArrowBuf.java:233)
位于net.snowflake.client.jdbc.internal.io.netty.buffer.ArrowBuf.nioBuffer(ArrowBuf.java:247)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ReadChannel.readFully(ReadChannel.java:81)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageSerializer.readMessageBody(MessageSerializer.java:696)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:68)
位于net.snowflake.client.jdbc.internal.apache.arrow.vector.ipc.ArrowStreamReader.loadNextBatch(ArrowStreamReader.java:106)
位于net.snowflake.client.jdbc.arrowsultchunk.readArrowStream(arrowsultchunk.java:117)
位于net.snowflake.client.core.sfarrowsresultset.buildFirstChunk(sfarrowsresultset.java:352)
位于net.snowflake.client.core.sfarrowsresultset.(sfarrowsresultset.java:230)
位于net.snowflake.client.jdbc.SnowflakeResultSetSerializableV1.getResultSet(SnowflakeResultSetSerializableV1.java:1079)
在net.snowflake.spark.snowflake.io.ResultIterator.liftedTree1$1(SnowflakeResultSetRDD.scala:85)
at net.snowflake.spark.snowflake.io.ResultIterator.(SnowflakeResultSetRDD.scala:78)
at net.snowflake.spark.snowflake.io.snowflakeResultsTrdd.compute(snowflakeResultsTrdd.scala:41)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
在org.apache.spark.rdd.rdd.computeOrReadCheckpoint(rdd.scala:373)
位于org.apache.spark.rdd.rdd.iterator(rdd.scala:337)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
位于org.apache.spark.scheduler.Task.run(Task.scala:127)
位于org.apache.spark.executor.executor$TaskRunner.$anonfun$run$3(executor.scala:464)
位于org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:467)
位于java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(未知源)
位于java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(未知源)
位于java.base/java.lang.Thread.run(未知源)
有人遇到过类似的情况吗?如果是,您是如何修复的?注意:以下更改是在本地系统中完成的它可能对您有效,也可能不适用 发布我如何解决问题的步骤 我的
brew
安装spark时也遇到类似问题openjdk@11默认情况下&为了解决这个问题,我将java版本从openjdk@11
到oracle jdk 1.8
(您可以使用OpenJDK 1.8代替oracle jdk 1.8)
已将java版本从更改为openjdk@11到OracleJDK1.8。现在,我的spark submit
命令如下所示
> cat spark-submit
#!/bin/bash
JAVA_HOME="/usr/share/jdk1.8.0_202" exec "/root/.linuxbrew/Cellar/apache-spark/3.0.0/libexec/bin/pyspark" "$@"
另一个解决方法要解决此问题,请尝试在spark submit的conf下设置
spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...
我遇到了同样的问题,并且能够解决它。我发现在Java>=9中,
io.netty.tryReflectionSetAccessible
需要显式设置为true
,Spark Snowflake connector才能读取Kubernetes executor pods中从Snowflake返回的数据
现在,由于io.netty
包在snowflake jdbc中有阴影,我们需要用完整的包名限定属性,即net.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true
此属性需要设置为Spark executor pod的JVM选项。这可以通过设置executor JVM选项或executor extra JVM Option Spark属性来实现。例如:
属性名称:spark.executor.extraJavaOptions
Value:
-Dnet.snowflake.client.jdbc.internal.io.netty.tryReflectionSetAccessible=true
在类路径中似乎有一些不可比拟的库??嘿@Srinivas,有趣的理论,我喜欢它,你知道要使用哪个库吗?你在这两台机器上都使用java 11(在我运行pyspak的地方,在executor pod中)是oracle jdk还是open jdk??
spark-submit \
--conf 'spark.executor.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
--conf 'spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true' \
...