Apache spark 使用spark streaming将avro数据集加载到Teradata时出现问题

Apache spark 使用spark streaming将avro数据集加载到Teradata时出现问题,apache-spark,teradata,spark-avro,spark-structured-streaming,Apache Spark,Teradata,Spark Avro,Spark Structured Streaming,我试图通过spark streaming jdbc将avro文件的数据集加载到Teradata表中。配置设置正确,加载成功到一定程度我可以验证数据行是否已插入表中,但中途我开始出现异常,加载失败。下面是stacktrace。有没有线索表明是什么导致了这一切 18/02/08 17:27:42 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 (TID 0) java.sql.BatchUpdateException: [Ter

我试图通过spark streaming jdbc将avro文件的数据集加载到Teradata表中。配置设置正确,加载成功到一定程度我可以验证数据行是否已插入表中,但中途我开始出现异常,加载失败。下面是stacktrace。有没有线索表明是什么导致了这一切

18/02/08 17:27:42 ERROR executor.Executor: Exception in task 2.0 in stage 0.0 (TID 0)
java.sql.BatchUpdateException: [Teradata JDBC Driver] [TeraJDBC 16.20.00.02] [Error 1154] [SQLState HY000] A failure occurred while inserting the batch of rows destined for database table "database"."table". Details of the failure can be found in the exception chain that is accessible with getNextException.
    at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:149)
    at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeBatchUpdateException(ErrorFactory.java:133)
    at com.teradata.jdbc.jdbc.fastload.FastLoadManagerPreparedStatement.executeBatch(FastLoadManagerPreparedStatement.java:2389)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:592)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:670)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:926)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
    at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1951)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: [Teradata JDBC Driver] [TeraJDBC 16.20.00.02] [Error 1147] [SQLState HY000] The next failure(s) in the exception chain occurred while beginning FastLoad of database table "database"."table"
    at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:95)
    at com.teradata.jdbc.jdbc_4.util.ErrorFactory.makeDriverJDBCException(ErrorFactory.java:70)
    at com.teradata.jdbc.jdbc.fastload.FastLoadManagerPreparedStatement.beginFastLoad(FastLoadManagerPreparedStatement.java:966)
    at com.teradata.jdbc.jdbc.fastload.FastLoadManagerPreparedStatement.executeBatch(FastLoadManagerPreparedStatement.java:2210)

问题源于试图以追加模式将数据加载到现有表。快速加载不支持这一点。该表应为空,即每次进程运行时都会被截断。这对于在处理数据之前暂存数据非常有用。但不是为了储存它