Classnotfound从pyspark本地计算机连接到雪花时出错
我正在本地机器上尝试从Pyspark连接到snowflake 我的代码如下所示Classnotfound从pyspark本地计算机连接到雪花时出错,pyspark,snowflake-cloud-data-platform,Pyspark,Snowflake Cloud Data Platform,我正在本地机器上尝试从Pyspark连接到snowflake 我的代码如下所示 from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext from pyspark.sql.types import * from pyspark import SparkConf, SparkContext sc = SparkContext("local", "sf_tes
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
sc = SparkContext("local", "sf_test")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('sf_test')
sfOptions = {
"sfURL" : "someaccount.some.address",
"sfAccount" : "someaccount",
"sfUser" : "someuser",
"sfPassword" : "somepassword",
"sfDatabase" : "somedb",
"sfSchema" : "someschema",
"sfWarehouse" : "somedw",
"sfRole" : "somerole",
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
当我运行这个特定的代码块时,我得到一个错误
df = spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option("query","""select * from
"PRED_ORDER_DEV"."SALES"."V_PosAnalysis" pos
ORDER BY pos."SAPAccountNumber", pos."SAPMaterialNumber" """).load()
Py4JJavaError:调用o115.load时出错:
java.lang.ClassNotFoundException:未能找到数据源:
net.snowflake.spark.snowflake。请在以下网址查找包裹:
在
org.apache.spark.sql.execution.datasources.DataSource$.lookUpdateSource(DataSource.scala:657)
在
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:194)
在
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:167)
我已经加载了连接器和JDBCJAR文件,并将它们添加到类路径中
pyspark --packages net.snowflake:snowflake-jdbc:3.11.1,net.snowflake:spark-snowflake_2.11:2.5.7-spark_2.4
CLASSPATH = C:\Program Files\Java\jre1.8.0_241\bin;C:\snowflake_jar
我希望能够连接到snowflake并使用Pyspark读取数据。任何帮助都将不胜感激 要运行pyspark应用程序,您可以使用
spark submit
并在--packages
选项下传递JAR。我假设您希望运行客户机模式,因此将其传递给--deploy mode
选项,最后添加pyspark程序的名称
如下所示:
spark-submit --packages net.snowflake:snowflake-jdbc:3.11.1,net.snowflake:spark-snowflake_2.11:2.5.7-spark_2.4 --deploy-mode client spark-snowflake.py
下面是工作脚本
您应该在项目的根目录中创建目录jar,并添加两个jar:
- snowflake-jdbc-3.13.4.jar(jdbc驱动程序)
- spark-snowflake_2.12-2.9.0-spark_3.1.jar(火花连接器)
- 司机-
- 连接器-
from pyspark.sql import SparkSession
sfOptions = {
"sfURL": "sfURL",
"sfUser": "sfUser",
"sfPassword": "sfPassword",
"sfDatabase": "sfDatabase",
"sfSchema": "sfSchema",
"sfWarehouse": "sfWarehouse",
"sfRole": "sfRole",
}
spark = SparkSession.builder \
.master("local") \
.appName("snowflake-test") \
.config('spark.jars', 'jar/snowflake-jdbc-3.13.4.jar,jar/spark-snowflake_2.12-2.9.0-spark_3.1.jar') \
.getOrCreate()
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
.options(**sfOptions) \
.option("query", "select * from some_table") \
.load()
df.show()