Sql server ApacheSpark-需要将特定列批量写入SQL Server表
我是Apache Spark的新手,我正在使用com.microsoft.azure.sqldb将数据批量写入SQL Server。如果源DF和目标表之间的列匹配数不存在问题。但是,我希望将特定列写入表中。当我尝试写特定的专栏时,我发现下面的错误Sql server ApacheSpark-需要将特定列批量写入SQL Server表,sql-server,apache-spark,apache-spark-sql,Sql Server,Apache Spark,Apache Spark Sql,我是Apache Spark的新手,我正在使用com.microsoft.azure.sqldb将数据批量写入SQL Server。如果源DF和目标表之间的列匹配数不存在问题。但是,我希望将特定列写入表中。当我尝试写特定的专栏时,我发现下面的错误 val pd = spark.read.option("header","true").option("delimiter","\\t")
val pd = spark.read.option("header","true").option("delimiter","\\t")
.csv("C:\\sd.txt")
//File has 35 columns and I want to write a unique combination of below 4 columns into a table
val pr = pd.
.select($"CI",$"MID",$"RC",$"CK").Distinct()
val prConfig = Config(Map(
"url" -> "localhost",
"port" -> "1433",
"databaseName" -> "Alpha2",
"dbTable" -> "dbo.PR",
"user" -> "sqlserver",
"password" -> "*******",
"connectTimeout" -> "5", //seconds
"queryTimeout" -> "5", //seconds
"bulkCopyBatchSize" -> "200000",
"bulkCopyTableLock" -> "false",
"bulkCopyTimeout" -> "600"
))
var bulkCopyMetadata = new BulkCopyMetadata
bulkCopyMetadata.addColumnMetadata(1, "CI", java.sql.Types.INTEGER, 0, 0)
bulkCopyMetadata.addColumnMetadata(2, "MID", java.sql.Types.INTEGER, 0, 0)
bulkCopyMetadata.addColumnMetadata(3, "RC", java.sql.Types.VARCHAR, 640, 0)
bulkCopyMetadata.addColumnMetadata(4, "CK", java.sql.Types.VARCHAR, 64, 0)
pr.bulkCopyToSqlDB(prConfig,bulkCopyMetadata)
该表有10列,但是这4列对于相应的数据类型(我想插入NULL的其余列)不是NULL。当我执行这段代码时,我得到以下错误
com.microsoft.sqlserver.jdbc.SQLServerException: Source and destination schemas do not match.
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.validateColumnMappings(SQLServerBulkCopy.java:1749)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:1579)
at com.microsoft.sqlserver.jdbc.SQLServerBulkCopy.writeToServer(SQLServerBulkCopy.java:606)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions.com$microsoft$azure$sqldb$spark$connect$DataFrameFunctions$$bulkCopy(DataFrameFunctions.scala:127)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at com.microsoft.azure.sqldb.spark.connect.DataFrameFunctions$$anonfun$bulkCopyToSqlDB$1.apply(DataFrameFunctions.scala:72)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)