Java 如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?
尝试运行以下代码时出错:Java 如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?,java,apache-spark,hive,apache-spark-sql,Java,Apache Spark,Hive,Apache Spark Sql,尝试运行以下代码时出错: import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class App { public static void main(String[] args) throws Exception { SparkSession .builder() .enab
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class App {
public static void main(String[] args) throws Exception {
SparkSession
.builder()
.enableHiveSupport()
.getOrCreate();
}
}
输出:
Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
at com.training.hivetest.App.main(App.java:21)
如何解决此问题?将以下依赖项添加到maven项目中
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.0.0</version>
</dependency>
org.apache.spark
spark-hive_2.11
2.0.0
我查看了源代码,发现尽管HiveSessionState(在spark hive中),启动SparkSession还需要另一个类HiveConf。而HiveConf不包含在spark hive*jar中,也许您可以在与hive相关的jar中找到它,并将其放在您的类路径中。我也有同样的问题。我可以通过添加以下依赖项来解决它。(我通过引用解决了此列表):
org.apache.spark
spark-hive_${scala.binary.version}
${spark.version}
org.apache.calcite
方解石
1.6.0
org.apache.calcite
方解石岩芯
1.12.0
org.spark-project.hive
蜂巢执行器
1.2.1.2
org.spark-project.hive
蜂巢元存储
1.2.1.2
org.codehaus.jackson
杰克逊地图绘制者
1.9.13
其中scala.binary.version=2.11和spark.version=2.1.0
<properties>
<scala.binary.version>2.11</scala.binary.version>
<spark.version>2.1.0</spark.version>
</properties>
2.11
2.1.0
我的Spark 2.4.1依赖项的完整列表在这里
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-avatica</artifactId>
<version>1.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.calcite</groupId>
<artifactId>calcite-core</artifactId>
<version>1.12.0</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.spark-project.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>1.2.1.spark2</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-mapper-asl</artifactId>
<version>1.9.13</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.6.7.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.6.7</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.9</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
<dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>commons-compiler</artifactId>
<version>3.0.9</version>
</dependency>
org.apache.spark
spark-hive_2.12
2.4.1
org.apache.calcite
方解石
1.6.0
org.apache.calcite
方解石岩芯
1.12.0
org.spark-project.hive
蜂巢执行器
1.2.1.2
org.spark-project.hive
蜂巢元存储
1.2.1.2
org.codehaus.jackson
杰克逊地图绘制者
1.9.13
com.fasterxml.jackson.core
杰克逊核心
2.6.7
com.fasterxml.jackson.core
杰克逊数据绑定
2.6.7.1
com.fasterxml.jackson.core
杰克逊注释
2.6.7
org.codehaus.janino
贾尼诺
3.0.9
org.codehaus.janino
通用编译器
3.0.9
[更新我的答案]这个关于StackOverflow的答案是正确的-
我还面临使用HiveSupport构建和运行Spark的问题。基于以上答案,我在Spark 2.12.8项目中做了以下工作
libraryDependencies += "junit" % "junit" % "4.12" % Test
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.4.2",
"org.apache.spark" %% "spark-sql" % "2.4.2",
"org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
"org.scalatest" %% "scalatest" % "3.0.3" % Test
)
对于SBT使用
//
我们使用了Spark-Core-2.1.0和Spark-SQL-2.1.0虽然所有的顶级答案都是正确的,但您仍然面临问题,请记住,即使您在pom中提到了JAR,问题中描述的错误仍然可能发生 为了解决此问题,请确保所有依赖项的版本应相同,并且作为标准做法,为spark version和scala version维护一个全局变量,并替换这些值以避免由于不同版本而产生任何冲突 仅供参考:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx.rehi</groupId>
<artifactId>Maven9211</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<spark.version>2.4.4</spark.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
</project>
4.0.0
com.xxx.rehi
Maven9211
1.0-快照
2.12
2.4.4
org.apache.spark
spark-core{scala.version}
${spark.version}
org.apache.spark
spark-sql{scala.version}
${spark.version}
org.apache.spark
spark-hive_${scala.version}
${spark.version}
就我而言,我必须检查
包括具有“提供”范围的依赖项
在intellij中的我的运行/调试配置下;dr您必须确保Spark SQL的
Spark hive
依赖项和所有可传递的依赖项在运行时在Spark SQL应用程序的类路径上可用(而不仅仅是编译所需的构建时间)
换句话说,您必须在spark应用程序的类路径上有
org.apache.spark.sql.hive.HiveSessionStateBuilder
和org.apache.hadoop.hive.conf.HiveConf
类(这与sbt或maven几乎没有关系)
前一个HiveSessionStateBuilder
是spark配置单元
依赖项(包括所有可传递的依赖项)的一部分
后一个
HiveConf
是hive exec
依赖项的一部分(即上述spark-hive
依赖项的可传递依赖项)。确保通过spark-submit脚本运行jar:
${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
感谢@lamber ken帮助我解决这个问题
有关更多信息:
我有spark-hive_2.11-2.0.1.jar,但我仍然得到了错误(我只是手动将jar添加到类路径-没有将依赖项添加到Maven)@Sam-T您能在java类中导入“org.apache.spark.sql.hive.*”?您是否尝试将其添加到maven中,然后更新maven项目?还需要确保您拥有与spark core和spark Sql相同的artifactId和版本号。这对我没有帮助,在我的情况下,这是因为通过JVM(java)而不是提供的spark脚本/bin/spark submit运行它。它不起作用。我尝试了
--jar spark-hive_2.11-2.4.4.jar
和maven依赖。两者都不起作用。请在pom.xml中导入spark hive maven依赖项,在pom中添加properties标记,并分别为spark.version和scala.version放置全局变量,并在依赖项中替换这些变量以避免版本冲突。为什么需要将它们作为
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.xxx.rehi</groupId>
<artifactId>Maven9211</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<scala.version>2.12</scala.version>
<spark.version>2.4.4</spark.version>
</properties>
<dependencies>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
</project>
${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
pom.xml
---
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.4.4</version>
<scope>compile</scope>
</dependency>
Test.java
---
SparkSession spark = SparkSession
.builder()
.appName("FeatureExtractor")
.config("spark.master", "local")
.config("spark.sql.hive.convertMetastoreParquet", false)
.config("spark.submit.deployMode", "client")
.config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
.config("spark.sql.warehouse.dir", "/user/hive/warehouse")
.config("hive.metastore.uris", "thrift://hivemetastore:9083")
.enableHiveSupport()
.getOrCreate();
bin/spark-submit \
--class com.TestExample \
--executor-memory 1G \
--total-executor-cores 2 \
test.jar