Java apachespark简单连接导致了一个神秘的错误_Java_Apache Spark_Cassandra

Java apachespark简单连接导致了一个神秘的错误

java apache-spark cassandra

Java apachespark简单连接导致了一个神秘的错误,java,apache-spark,cassandra,Java,Apache Spark,Cassandra,我有两个数据集，可以分别查询和显示（）。一个有17条记录，另一个有3条 Dataset<Row> attReader = spark .read() .format("org.apache.spark.sql.cassandra") .option("table", "table_1") .load(); Dataset<Row> surReader = spark .read() .format("org.apache.

我有两个数据集，可以分别查询和显示（）。一个有17条记录，另一个有3条

Dataset<Row> attReader = spark
    .read()
    .format("org.apache.spark.sql.cassandra")
    .option("table", "table_1")
    .load();

Dataset<Row> surReader = spark
    .read()
    .format("org.apache.spark.sql.cassandra")
    .option("table", "table_2")
    .load();

Dataset attReader=spark
.读（）
.format（“org.apache.spark.sql.cassandra”）
.选项（“表”、“表1”）
.load（）；
Dataset SuperReader=spark
.读（）
.format（“org.apache.spark.sql.cassandra”）
.选项（“表”、“表2”）
.load（）；

当我尝试加入并向他们展示时：

    Dataset<Row> joined = attReader.join(surReader,
        attReader.col("key_field").equalTo(surReader.col("key_field")), "inner");
    joined.show();

Dataset joined=attReader.join（超读取器、，
attReader.col（“键域”）.equalTo（superader.col（“键域”），“内部”）；
joined.show（）；

我确信这些字段是正确的，因为我可以显示各个数据集的数据并查看它们。连接字段是字符串

我得到以下例外情况，但没有提供太多帮助：

org.apache.spark.SparkException: Exception thrown in awaitResult: 
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:136)
    at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:367)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:144)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:140)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:140)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:135)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:232)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:102)
    at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
    at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:85)
    at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:206)
    at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:181)
    at org.apache.spark.sql.execution.RowDataSourceScanExec.consume(DataSourceScanExec.scala:77)
    at org.apache.spark.sql.execution.RowDataSourceScanExec.doProduce(DataSourceScanExec.scala:125)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.RowDataSourceScanExec.produce(DataSourceScanExec.scala:77)
    at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:125)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:85)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:97)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:39)
    at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:45)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:88)
    at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:83)
    at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:35)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:524)
    at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:576)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
    at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:337)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489)
    at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2489)
    at org.apache.spark.sql.Dataset.take(Dataset.scala:2703)
    at org.apache.spark.sql.Dataset.showString(Dataset.scala:254)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:723)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:682)
    at org.apache.spark.sql.Dataset.show(Dataset.scala:691)
    at com.kilonova.CassandraStream.sparkSql(CassandraStream.java:111)
    at com.kilonova.CassandraStream.init(CassandraStream.java:174)
    at com.kilonova.Main.runStream(Main.java:20)
    at com.kilonova.Main.main(Main.java:14)
Caused by: java.lang.IllegalArgumentException
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.<init>(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:46)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:449)
    at org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:432)
    at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:103)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:103)
    at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
    at org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:432)
    at org.apache.xbean.asm5.ClassReader.a(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.b(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.xbean.asm5.ClassReader.accept(Unknown Source)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:262)
    at org.apache.spark.util.ClosureCleaner$$anonfun$org$apache$spark$util$ClosureCleaner$$clean$14.apply(ClosureCleaner.scala:261)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:261)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2073)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:944)
    at org.apache.spark.sql.execution.SparkPlan.executeCollectIterator(SparkPlan.scala:304)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:76)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
    at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:97)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
    at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
    at java.base/java.lang.Thread.run(Thread.java:844)

org.apache.spark.SparkException:结果中引发的异常：
位于org.apache.spark.util.ThreadUtils$.awaitResult（ThreadUtils.scala:205）
在org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast（BroadcastExchangeExec.scala:136）上
位于org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast（whisttagecodegenexec.scala:367）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply（SparkPlan.scala:144）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply（SparkPlan.scala:140）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply（SparkPlan.scala:155）
位于org.apache.spark.rdd.RDDOperationScope$.withScope（RDDOperationScope.scala:151）
位于org.apache.spark.sql.execution.SparkPlan.executeQuery（SparkPlan.scala:152）
位于org.apache.spark.sql.execution.SparkPlan.executeBroadcast（SparkPlan.scala:140）
位于org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast（BroadcastHashJoinExec.scala:135）
位于org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.CodeGeniner（BroadcastHashJoinExec.scala:232）
位于org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume（BroadcastHashJoinExec.scala:102）
位于org.apache.spark.sql.execution.CodegenSupport$class.consumer（whisttagecodegenexec.scala:181）
位于org.apache.spark.sql.execution.FilterExec.consume（basicPhysicalOperators.scala:85）
位于org.apache.spark.sql.execution.FilterExec.doConsume（basicPhysicalOperators.scala:206）
位于org.apache.spark.sql.execution.CodegenSupport$class.consumer（whisttagecodegenexec.scala:181）
位于org.apache.spark.sql.execution.RowDataSourceScanExec.consume（DataSourceScanExec.scala:77）
位于org.apache.spark.sql.execution.RowDataSourceScanExec.doProduce（DataSourceScanExec.scala:125）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:88）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply（SparkPlan.scala:155）
位于org.apache.spark.rdd.RDDOperationScope$.withScope（RDDOperationScope.scala:151）
位于org.apache.spark.sql.execution.SparkPlan.executeQuery（SparkPlan.scala:152）
位于org.apache.spark.sql.execution.CodegenSupport$class.product（whisttagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.RowDataSourceScanExec.product（DataSourceScanExec.scala:77）
位于org.apache.spark.sql.execution.FilterExec.doProduce（basicPhysicalOperators.scala:125）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:88）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply（SparkPlan.scala:155）
位于org.apache.spark.rdd.RDDOperationScope$.withScope（RDDOperationScope.scala:151）
位于org.apache.spark.sql.execution.SparkPlan.executeQuery（SparkPlan.scala:152）
位于org.apache.spark.sql.execution.CodegenSupport$class.product（whisttagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.FilterExec.product（basicPhysicalOperators.scala:85）
位于org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce（BroadcastHashJoinExec.scala:97）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:88）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply（SparkPlan.scala:155）
位于org.apache.spark.rdd.RDDOperationScope$.withScope（RDDOperationScope.scala:151）
位于org.apache.spark.sql.execution.SparkPlan.executeQuery（SparkPlan.scala:152）
位于org.apache.spark.sql.execution.CodegenSupport$class.product（whisttagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.product（BroadcastHashJoinExec.scala:39）
位于org.apache.spark.sql.execution.ProjectExec.doProduce（basicPhysicalOperators.scala:45）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:88）
位于org.apache.spark.sql.execution.CodegenSupport$$anonfun$product$1.apply（whistagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply（SparkPlan.scala:155）
位于org.apache.spark.rdd.RDDOperationScope$.withScope（RDDOperationScope.scala:151）
位于org.apache.spark.sql.execution.SparkPlan.executeQuery（SparkPlan.scala:152）
位于org.apache.spark.sql.execution.CodegenSupport$class.product（whisttagecodegenexec.scala:83）
位于org.apache.spark.sql.execution.ProjectExec.product（basicPhysicalOperators.scala:35）
位于org.apache.spark.sql.execution.whisttagecodegenexec.doCodeGen（whisttagecodegenexec.scala:524）
位于org.apache.spark.sql.execution.whisttagecodegenexec.doExecute（whisttagecodegenexec.scala:576）
位于org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.a
dataframe.queryExecution.sparkPlan