Sql server 无法使用PySpark插入SQL,但可以在SQL中工作
我使用以下命令在SQL中创建了下表:Sql server 无法使用PySpark插入SQL,但可以在SQL中工作,sql-server,pyspark,databricks,Sql Server,Pyspark,Databricks,我使用以下命令在SQL中创建了下表: CREATE TABLE [dbo].[Validation]( [RuleId] [int] IDENTITY(1,1) NOT NULL, [AppId] [varchar](255) NOT NULL, [Date] [date] NOT NULL, [RuleName] [varchar](255) NOT NULL, [Value] [nvarchar](4000) NOT NULL ) 注意标识键Rule
CREATE TABLE [dbo].[Validation](
[RuleId] [int] IDENTITY(1,1) NOT NULL,
[AppId] [varchar](255) NOT NULL,
[Date] [date] NOT NULL,
[RuleName] [varchar](255) NOT NULL,
[Value] [nvarchar](4000) NOT NULL
)
注意标识键RuleId
在SQL中将值插入到表中时,它会起作用,如下所示:
注意:如果表为空且递增,则不按原样插入主键将自动填充
INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')
但是,在databricks上创建临时表并执行下面相同的查询时,在PySpark上运行此查询,如下所示:
%python
driver = <Driver>
url = "jdbc:sqlserver:<URL>"
database = "<db>"
table = "dbo.Validation"
user = "<user>"
password = "<pass>"
#import the data
remote_table = spark.read.format("jdbc")\
.option("driver", driver)\
.option("url", url)\
.option("database", database)\
.option("dbtable", table)\
.option("user", user)\
.option("password", password)\
.load()
remote_table.createOrReplaceTempView("YOUR_TEMP_VIEW_NAMES")
sqlcontext.sql("INSERT INTO YOUR_TEMP_VIEW_NAMES VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')")
我得到以下错误:
AnalysisException:“未知”要求要插入的数据具有与目标表相同的列数:目标表有5列,但插入的数据有4列,包括0个具有常量值的分区列。;”
为什么它在SQL上工作,而在通过databricks传递查询时不工作?如何通过pyspark插入而不出现此错误?这里最简单的解决方案是从Scala单元使用JDBC。乙二醇
%scala
import java.util.Properties
import java.sql.DriverManager
val jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
val jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
val driverClass = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
// Create the JDBC URL without passing in the user and password parameters.
val jdbcUrl = s"jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
// Create a Properties() object to hold the parameters.
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUsername}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
val stmt = connection.createStatement()
val sql = "INSERT INTO dbo.Validation VALUES ('TestApp','2020-05-15','MemoryUsageAnomaly','2300MB')"
stmt.execute(sql)
connection.close()
您也可以使用pyodbc,但是默认情况下不安装sqlserverodbc驱动程序,而安装JDBC驱动程序
Spark的解决方案是在SQL Server中创建一个视图,并根据该视图插入。乙二醇
create view Validation2 as
select AppId,Date,RuleName,Value
from Validation
然后
如果要封装Scala并从另一种语言(如Python)调用它,可以使用Scala
乙二醇
然后你可以这样称呼它:
jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
jdbcUrl = "jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sql = "select 1 a into #foo from sys.objects"
sc._jvm.example.JDBCFacade.runStatement(jdbcUrl,sql, jdbcUsername, jdbcPassword)
@DaleK,我在你的临时视图中尝试了sqlContext.sqlINSERT,名称为Appid,日期,规则名,值1,'2020-05-15','memoryusagenormal','2300MB',但是我得到了一个解析异常:ParseException:\n匹配的输入'Appid'期望{,'SELECT','FROM','DESC VALUES','TABLE','INSERT','descriple','MERGE',UPDATE REDUCE}第1行,位置34\n\n==SQL==\n在您的临时视图中插入名称Appid、日期、规则名、值1、'2020-05-15'、'MemoryUsageNormal'、'2300MB'\n--------------------因为我在python中使用sqlcontext.SQL,我不会得到与上面的注释相同的解析异常吗?解析异常:ParseException:\n匹配的输入'Appid'应为{,'SELECT','FROM','DESC','VALUES','TABLE','INSERT','description','MAP','MERGE','UPDATE','REDUCE'}行1,pos 34\n\n==SQL===\n插入您的临时视图中\u命名Appid,日期,规则名,值1,'2020-05-15','MemoryUsage',“2300MB”\n--------------------^^\n标记SQL不支持在插入中指定目标列。看见在Scala示例中,这是TSQL而不是Spark SQL。您可以指定输入列,或者让SQL Server自动忽略标识列。Browne,有什么建议可以让我在python环境中运行Scala吗?我需要在python函数中使用scala。这是个好问题。对我刚刚了解了Scala封装单元,它们在这里工作。请参阅更新。运行良好。谢谢你的帮助!
%scala
package example
import java.util.Properties
import java.sql.DriverManager
object JDBCFacade
{
def runStatement(url : String, sql : String, userName : String, password: String): Unit =
{
val connection = DriverManager.getConnection(url, userName, password)
val stmt = connection.createStatement()
try
{
stmt.execute(sql)
}
finally
{
connection.close()
}
}
}
jdbcUsername = dbutils.secrets.get(scope = "kv", key = "sqluser")
jdbcPassword = dbutils.secrets.get(scope = "kv", key = "sqlpassword")
jdbcUrl = "jdbc:sqlserver://xxxx.database.windows.net:1433;database=AdventureWorks;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
sql = "select 1 a into #foo from sys.objects"
sc._jvm.example.JDBCFacade.runStatement(jdbcUrl,sql, jdbcUsername, jdbcPassword)