Python SQL Server的JDBC驱动程序(java.SQL.SQLException:没有合适的驱动程序)

Python SQL Server的JDBC驱动程序(java.SQL.SQLException:没有合适的驱动程序),python,amazon-web-services,apache-spark,aws-glue,Python,Amazon Web Services,Apache Spark,Aws Glue,在将python代码推送到粘合作业之前,我试图在本地对其进行测试,但遇到以下问题: py4j.protocol.Py4JJavaError: An error occurred while calling o35.load. : java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(Unknown Source) at org.apache.spark.sql.e

在将python代码推送到粘合作业之前,我试图在本地对其进行测试,但遇到以下问题:

py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
: java.sql.SQLException: No suitable driver
        at java.sql.DriverManager.getDriver(Unknown Source)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$2(JDBCOptions.scala:108)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:108)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.<init>(JDBCOptions.scala:38)
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:354)
        at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:326)
        at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:308)
        at scala.Option.getOrElse(Option.scala:189)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:308)
        at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:226)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Unknown Source)
环境变量:

MS的jdbc驱动程序位置:

更新-使用gluecontext修复了驱动程序问题,但提供了新的驱动程序:

glue_context = GlueContext(SparkContext.getOrCreate())
jdbc_df_t = glue_context.spark_session.read.format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider") \
    .option("url", "jdbc:sqlserver://mssql-rds-development.xxxxx.us-east-1.rds.amazonaws.com:xxxx") \
    .option("query", "SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE FROM information_schema.columns") \
    .option("user", user) \
    .option("password", pass) \
    .load()
使用gluecontext后出错:

  File "C:\Program Files\Anaconda3\lib\site-packages\awsglue\context.py", line 45, in __init__
    self._glue_scala_context = self._get_glue_scala_context(**options)
  File "C:\Program Files\Anaconda3\lib\site-packages\awsglue\context.py", line 66, in _get_glue_scala_context
    return self._jvm.GlueContext(self._jsc.sc())
TypeError: 'JavaPackage' object is not callable

我不希望您需要使用自定义JDBC驱动程序,请在不使用它的情况下尝试以下方法(如我最近的回答所示: )

此块应完成以下操作:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

jdbc_df = glue_context.spark_session.read.format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider") \
    .option("url", "jdbc:sqlserver://mssql-server.xxxxxxx.us-east-1.rds.amazonaws.com:xxxx") \
    .option("dbtable", "(SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE FROM db_name.information_schema.columns WHERE TABLE_SCHEMA = 'RDM')") \
    .option("user", user) \
    .option("password", pass) \
    .load()

越来越近,但出现了错误:`File“C:\Program Files\Anaconda3\lib\site packages\awsglue\context.py”,第45行,在init self中。\\u glue\u scala\u context=self。\\u get\u glue\u scala\u context(**选项)File“C:\Program Files\Anaconda3\lib\site packages\awsglue\context.py”,第66行,在\u get\u glue\u scala\u context返回self中。\\u jvm.GlueContext(self.\u jsc.sc())TypeError:'JavaPackage'对象不可调用`很抱歉,在重读您的问题时,您似乎正在windows中直接使用pyspark,我错过了。我建议您尝试将代码上传到AWS,看看会发生什么。运行windows代码与RDS对话会带来一系列您需要解决的网络挑战(但这当然不是您看到的错误)。请注意,在glue作业的配置中,不需要指定任何类型的jdbc驱动程序。是的,加载到AWS后,两个版本(spark/glue上下文)都可以正常工作。我希望避免为了在本地运行脚本而不断地上传和运行脚本。您不指定jdbc驱动程序是什么意思?我不确定Spark在本地使用的java库是否具有在Spark由AWS Glue托管时使用的注入,因此我不确定您正在尝试的东西是否能够在本地工作;您超出了我的专业知识范围:)。如果您确实希望在本地工作,您可能希望使用加载到数据帧或动态帧(例如通过单元测试模式)中的本地文件/数据,而不是连接到RDS作为测试平台,这就是我所做的。作为一个单独的问题,如何对胶水作业进行局部单元测试可能是最合适的。我做了新的发现。现在,python模块似乎正在引发问题:
from awsglue.context import GlueContext
from pyspark.context import SparkContext

glue_context = GlueContext(SparkContext.getOrCreate())

jdbc_df = glue_context.spark_session.read.format("org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider") \
    .option("url", "jdbc:sqlserver://mssql-server.xxxxxxx.us-east-1.rds.amazonaws.com:xxxx") \
    .option("dbtable", "(SELECT TABLE_NAME, COLUMN_NAME, DATA_TYPE, IS_NULLABLE FROM db_name.information_schema.columns WHERE TABLE_SCHEMA = 'RDM')") \
    .option("user", user) \
    .option("password", pass) \
    .load()