将pyspark与WASB一起使用时出错/将pyspark与Azure Blob连接时出错

将pyspark与WASB一起使用时出错/将pyspark与Azure Blob连接时出错,azure,pyspark,Azure,Pyspark,我目前正致力于将Azure blob与Pyspark连接起来,但在将两者连接起来并运行时遇到了困难。我已经安装了两个必需的jar文件(hadoop-azure-3.2.0-javadoc.jar和azure-storage-8.3.0-javadoc.jar)。我使用sparkConf().setAll()将它们设置为在我的sparkConf中读取,一旦启动会话,我使用: spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.h

我目前正致力于将Azure blob与Pyspark连接起来,但在将两者连接起来并运行时遇到了困难。我已经安装了两个必需的jar文件(hadoop-azure-3.2.0-javadoc.jar和azure-storage-8.3.0-javadoc.jar)。我使用
sparkConf().setAll()
将它们设置为在我的
sparkConf
中读取,一旦启动会话,我使用:

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")
但它总是回来

java.io.IOException:没有scheme:wasbs的文件系统

有什么想法吗

我遵循了以下几点:

返回

java.io.IOException:没有scheme:wasbs的文件系统


您可能缺少wasb路径末尾的目录:wasbs://container@acctname.blob.core.windows.net/如果不是,我遇到了非常类似的问题您可能缺少wasb路径末尾的目录:wasbs://container@acctname.blob.core.windows.net/如果不是的话,我也遇到了类似的问题
import findspark

findspark.init('dir/spark/spark-2.4.0-bin-hadoop2.7')

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql import SQLContext

conf = SparkConf().setAll([(u'spark.submit.pyFiles', u'/dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar,/dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,/dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,/dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,/dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,/dir/.ivy2/jars/joda-time_joda-time-2.3.jar,/dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,/dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,/dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.app.id', u'local-1553969107475'), (u'spark.driver.port', u'38809'), (u'spark.executor.id', u'driver'), (u'spark.app.name', u'PySparkShell'), (u'spark.driver.host', u'test-VM'), (u'spark.sql.catalogImplementation', u'hive'), (u'spark.rdd.compress', u'True'),(u'spark.serializer.objectStreamReset', u'100'), (u'spark.master', u'local[*]'), (u'spark.submit.deployMode', u'client'), (u'spark.repl.local.jars', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar'), (u'spark.files', u'file:///dir/.ivy2/jars/com.twitter_jsr166e-1.1.0.jar,file:///dir/.ivy2/jars/io.netty_netty-all-4.0.33.Final.jar,file:///dir/.ivy2/jars/commons-beanutils_commons-beanutils-1.9.3.jar,file:///dir/.ivy2/jars/joda-time_joda-time-2.3.jar,file:///dir/.ivy2/jars/org.joda_joda-convert-1.2.jar,file:///dir/.ivy2/jars/org.scala-lang_scala-reflect-2.11.12.jar,file:///dir/.ivy2/jars/commons-collections_commons-collections-3.2.2.jar,file:///dir/.ivy2/jars/azure-storage-8.3.0-javadoc.jar,file:///dir/.ivy2/jars/hadoop-azure-3.2.0-javadoc.jar'), (u'spark.ui.showConsoleProgress', u'true')])

sc = SparkContext(conf=conf)
spark = SparkSession(sc)

spark._jsc.hadoopConfiguration().set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")

spark._jsc.hadoopConfiguration().set("fs.azure.account.key.acctname.blob.core.windows.net", "key")

sdf = spark.read.parquet("wasbs://container@acctname.blob.core.windows.net/")