Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 将数据帧结果插入配置单元表时引发异常_Apache Spark_Pyspark_Apache Spark Sql - Fatal编程技术网

Apache spark 将数据帧结果插入配置单元表时引发异常

Apache spark 将数据帧结果插入配置单元表时引发异常,apache-spark,pyspark,apache-spark-sql,Apache Spark,Pyspark,Apache Spark Sql,这是我的代码片段。当执行spar.sql(查询)时,我遇到以下异常 我的表_v2有262列。我的表_v3有9列 有人能面对类似的问题并帮助解决这个问题吗?短暂性脑缺血发作 spark = SparkSession.builder.enableHiveSupport().getOrCreate() sc=spark.sparkContext df1 = spark.sql("select * from myDB.table_v2") df2 = spark.sql("

这是我的代码片段。当执行
spar.sql(查询)
时,我遇到以下异常

我的
表_v2
262列
。我的
表_v3
9列

有人能面对类似的问题并帮助解决这个问题吗?短暂性脑缺血发作

spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sc=spark.sparkContext

df1 = spark.sql("select * from myDB.table_v2")
df2 = spark.sql("select * from myDB.table_v3")

result_df = df1.join(df2, (df1.id_c == df2.id_c) & (df1.cycle_r == df2.cycle_r) & (df1.consumer_r == df2.consumer_r))
final_result_df = result_df.select(df1["*"])

final_result_df.distinct().createOrReplaceTempView("results")
query = "INSERT INTO TABLE myDB.table_v2_final select * from results"
spark.sql(query);
我试图在conf中设置参数,但这无助于解决问题:

spark.sql.debug.maxToStringFields=500
错误:

20/12/16 19:28:20 ERROR FileFormatWriter: Job job_20201216192707_0002 aborted.
20/12/16 19:28:20 ERROR Executor: Exception in task 90.0 in stage 2.0 (TID 225)
org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:109)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Missing required char ':' at 'struct<>
    at org.apache.orc.TypeDescription.requireChar(TypeDescription.java:293)
    at org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:326)
    at org.apache.orc.TypeDescription.parseType(TypeDescription.java:385)
    at org.apache.orc.TypeDescription.fromString(TypeDescription.java:406)
    at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.org$apache$spark$sql$execution$datasources$orc$OrcSerializer$$createOrcValue(OrcSerializer.scala:226)
    at org.apache.spark.sql.execution.datasources.orc.OrcSerializer.<init>(OrcSerializer.scala:36)
    at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.<init>(OrcOutputWriter.scala:36)
    at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:108)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
    ... 8 more
20/12/16 19:28:20错误FileFormatWriter:作业作业\u 20201216192707\u 0002已中止。
2016年12月20日19:28:20错误执行者:第2.0阶段任务90.0中的异常(TID 225)
org.apache.spark.SparkException:任务在写入行时失败。
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:285)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:197)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:196)
位于org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
位于org.apache.spark.scheduler.Task.run(Task.scala:109)
位于org.apache.spark.executor.executor$TaskRunner.run(executor.scala:345)
位于java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
位于java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
运行(Thread.java:748)
原因:java.lang.IllegalArgumentException:缺少必需的字符“:”at”struct
位于org.apache.orc.TypeDescription.requireChar(TypeDescription.java:293)
位于org.apache.orc.TypeDescription.parseStruct(TypeDescription.java:326)
在org.apache.orc.TypeDescription.parseType(TypeDescription.java:385)
位于org.apache.orc.TypeDescription.fromString(TypeDescription.java:406)
位于org.apache.spark.sql.execution.datasources.orc.OrcSerializer.org$apache$spark$sql$execution$datasources$orc$OrcSerializer$$createOrcValue(OrcSerializer.scala:226)
位于org.apache.spark.sql.execution.datasources.orc.OrcSerializer.(OrcSerializer.scala:36)
位于org.apache.spark.sql.execution.datasources.orc.orcoutputwitter.(orcoutputwitter.scala:36)
位于org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:108)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:367)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:378)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:269)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:267)
位于org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1415)
位于org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
... 8个以上

我删除了我的
myDB.table\u v2\u final
并修改了代码中的下一行,它成功了

我怀疑我创建表的方式可能存在一些问题

query = "create external table myDB.table_v2_final as select * from results"