Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?_Java_Apache Spark_Hive_Apache Spark Sql - Fatal编程技术网

Java 如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?

Java 如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?,java,apache-spark,hive,apache-spark-sql,Java,Apache Spark,Hive,Apache Spark Sql,尝试运行以下代码时出错: import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; public class App { public static void main(String[] args) throws Exception { SparkSession .builder() .enab

尝试运行以下代码时出错:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class App {
  public static void main(String[] args) throws Exception {
    SparkSession
      .builder()
      .enableHiveSupport()
      .getOrCreate();        
  }
}
输出:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
    at com.training.hivetest.App.main(App.java:21)

如何解决此问题?

将以下依赖项添加到maven项目中

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.0.0</version>
</dependency>

org.apache.spark
spark-hive_2.11
2.0.0

我查看了源代码,发现尽管HiveSessionState(在spark hive中),启动SparkSession还需要另一个类HiveConf。而HiveConf不包含在spark hive*jar中,也许您可以在与hive相关的jar中找到它,并将其放在您的类路径中。

我也有同样的问题。我可以通过添加以下依赖项来解决它。(我通过引用解决了此列表):


org.apache.spark
spark-hive_${scala.binary.version}
${spark.version}
org.apache.calcite
方解石
1.6.0
org.apache.calcite
方解石岩芯
1.12.0
org.spark-project.hive
蜂巢执行器
1.2.1.2
org.spark-project.hive
蜂巢元存储
1.2.1.2
org.codehaus.jackson
杰克逊地图绘制者
1.9.13
其中scala.binary.version=2.11和spark.version=2.1.0

 <properties>
      <scala.binary.version>2.11</scala.binary.version>
      <spark.version>2.1.0</spark.version>
    </properties>

2.11
2.1.0

我的Spark 2.4.1依赖项的完整列表在这里

  <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.12</artifactId>
      <version>2.4.1</version>
  </dependency>

  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-avatica</artifactId>
      <version>1.6.0</version>
  </dependency>
  <dependency>
      <groupId>org.apache.calcite</groupId>
      <artifactId>calcite-core</artifactId>
      <version>1.12.0</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-exec</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.spark-project.hive</groupId>
      <artifactId>hive-metastore</artifactId>
      <version>1.2.1.spark2</version>
  </dependency>
  <dependency>
      <groupId>org.codehaus.jackson</groupId>
      <artifactId>jackson-mapper-asl</artifactId>
      <version>1.9.13</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-core</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
      <version>2.6.7.1</version>
  </dependency>


  <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-annotations -->
  <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
      <version>2.6.7</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/janino -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>janino</artifactId>
      <version>3.0.9</version>
  </dependency>

  <!-- https://mvnrepository.com/artifact/org.codehaus.janino/commons-compiler -->
  <dependency>
      <groupId>org.codehaus.janino</groupId>
      <artifactId>commons-compiler</artifactId>
      <version>3.0.9</version>
  </dependency>

org.apache.spark
spark-hive_2.12
2.4.1
org.apache.calcite
方解石
1.6.0
org.apache.calcite
方解石岩芯
1.12.0
org.spark-project.hive
蜂巢执行器
1.2.1.2
org.spark-project.hive
蜂巢元存储
1.2.1.2
org.codehaus.jackson
杰克逊地图绘制者
1.9.13
com.fasterxml.jackson.core
杰克逊核心
2.6.7
com.fasterxml.jackson.core
杰克逊数据绑定
2.6.7.1
com.fasterxml.jackson.core
杰克逊注释
2.6.7
org.codehaus.janino
贾尼诺
3.0.9
org.codehaus.janino
通用编译器
3.0.9

[更新我的答案]这个关于StackOverflow的答案是正确的-

我还面临使用HiveSupport构建和运行Spark的问题。基于以上答案,我在Spark 2.12.8项目中做了以下工作

  • 将my build.sbt更新为以下内容
  • 手动删除.idea/libraries中的文件
  • 单击sbt外壳窗口中的“刷新所有sbt项目”按钮(我正在使用intellij)
  • 我现在可以毫无问题地运行该项目了

    libraryDependencies += "junit" % "junit" % "4.12" % Test
    libraryDependencies ++= Seq(
      "org.apache.spark" %% "spark-core" % "2.4.2",
      "org.apache.spark" %% "spark-sql" % "2.4.2",
      "org.apache.spark" %% "spark-hive" % "2.4.2" % "provided",
      "org.scalatest" %% "scalatest" % "3.0.3" % Test
    )
    

    对于SBT使用
    //



    我们使用了Spark-Core-2.1.0Spark-SQL-2.1.0

    虽然所有的顶级答案都是正确的,但您仍然面临问题,请记住,即使您在pom中提到了JAR,问题中描述的错误仍然可能发生

    为了解决此问题,请确保所有依赖项的版本应相同,并且作为标准做法,为spark version和scala version维护一个全局变量,并替换这些值以避免由于不同版本而产生任何冲突

    仅供参考:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.xxx.rehi</groupId>
        <artifactId>Maven9211</artifactId>
        <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.12</scala.version>
        <spark.version>2.4.4</spark.version>
    </properties>
    
    
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
    </dependencies>
    </project> 
    
    
    4.0.0
    com.xxx.rehi
    Maven9211
    1.0-快照
    2.12
    2.4.4
    org.apache.spark
    spark-core{scala.version}
    ${spark.version}
    org.apache.spark
    spark-sql{scala.version}
    ${spark.version}
    org.apache.spark
    spark-hive_${scala.version}
    ${spark.version}
    
    就我而言,我必须检查

    包括具有“提供”范围的依赖项


    在intellij中的我的运行/调试配置下;dr您必须确保Spark SQL的
    Spark hive
    依赖项和所有可传递的依赖项在运行时在Spark SQL应用程序的类路径上可用(而不仅仅是编译所需的构建时间)


    换句话说,您必须在spark应用程序的类路径上有
    org.apache.spark.sql.hive.HiveSessionStateBuilder
    org.apache.hadoop.hive.conf.HiveConf
    类(这与sbt或maven几乎没有关系)

    前一个
    HiveSessionStateBuilder
    spark配置单元
    依赖项(包括所有可传递的依赖项)的一部分


    后一个
    HiveConf
    hive exec
    依赖项的一部分(即上述
    spark-hive
    依赖项的可传递依赖项)。

    确保通过spark-submit脚本运行jar:

    ${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
    
    感谢@lamber ken帮助我解决这个问题

    有关更多信息:


    我有spark-hive_2.11-2.0.1.jar,但我仍然得到了错误(我只是手动将jar添加到类路径-没有将依赖项添加到Maven)@Sam-T您能在java类中导入“org.apache.spark.sql.hive.*”?您是否尝试将其添加到maven中,然后更新maven项目?还需要确保您拥有与spark core和spark Sql相同的artifactId和版本号。这对我没有帮助,在我的情况下,这是因为通过JVM(java)而不是提供的spark脚本/bin/spark submit运行它。它不起作用。我尝试了
    --jar spark-hive_2.11-2.4.4.jar
    和maven依赖。两者都不起作用。请在pom.xml中导入spark hive maven依赖项,在pom中添加properties标记,并分别为spark.version和scala.version放置全局变量,并在依赖项中替换这些变量以避免版本冲突。为什么需要将它们作为
    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.xxx.rehi</groupId>
        <artifactId>Maven9211</artifactId>
        <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.12</scala.version>
        <spark.version>2.4.4</spark.version>
    </properties>
    
    
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
    </dependencies>
    </project> 
    
    ${SPARK_HOME}/bin/spark-submit <settings> <your-jar-name>
    
    pom.xml
    ---
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.4.4</version>
      <scope>compile</scope>
    </dependency>
    
    Test.java
    ---
    SparkSession spark = SparkSession
        .builder()
        .appName("FeatureExtractor")
        .config("spark.master", "local")
        .config("spark.sql.hive.convertMetastoreParquet", false)
        .config("spark.submit.deployMode", "client")
        .config("spark.jars.packages", "org.apache.spark:spark-avro_2.11:2.4.4")
        .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
        .config("hive.metastore.uris", "thrift://hivemetastore:9083")
        .enableHiveSupport()
        .getOrCreate();
    
    bin/spark-submit \
    --class com.TestExample \
    --executor-memory 1G \
    --total-executor-cores 2 \
    test.jar