Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用spark和scala从hive获取空表_Scala_Apache Spark_Hive - Fatal编程技术网

使用spark和scala从hive获取空表

使用spark和scala从hive获取空表,scala,apache-spark,hive,Scala,Apache Spark,Hive,我想使用spark编写scala代码,从配置单元服务器获取数据帧- import org.apache.hadoop.conf.Configuration import org.apache.hadoop.security.UserGroupInformation import scala.util.Properties import org.apache.spark.sql.SparkSession val configuration = new Configuration configur

我想使用spark编写scala代码,从配置单元服务器获取数据帧-

import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.security.UserGroupInformation
import scala.util.Properties
import org.apache.spark.sql.SparkSession

val configuration = new Configuration
configuration.set("hadoop.security.authentication", "Kerberos")
Properties.setProp("java.security.krb5.conf", krb5LocationInMySystem)  
UserGroupInformation.setConfiguration(configuration)   
UserGroupInformation.loginUserFromKeytab(principal,keytabLocation)

val spSession = SparkSession.builder().config("spark.master","local").config("spark.sql.warehouse.dir", "file:/Users/username/IdeaProjects/project_name/spark-warehouse/").enableHiveSupport().getOrCreate()
spSession.read.format("jdbc")
.option("url","jdbc:hive2://host:port/default;principal=hive/host@realm.com")
.option("driver", "org.apache.hive.jdbc.HiveDriver")
.option("dbtable", "tablename").load().show()
获得像这样的输出

column1|column2|column3....
(只有这么多产出)

运行时,程序会等待第一次说:

Will try to open client transport with JDBC Uri:(url)
Code generated in 159.970292 ms
然后是几行……然后又是:

will try to open client transport with JDBC Uri:(url)
INFO JDBCRDD: closed connection
它给人一张空桌子

我已经找过了-

但要么他们没有解释我想要什么,要么我无法理解他们在说什么。对于第二个链接,我尝试过,但找不到如何在scala中使用setInputPathFilter

依赖项:

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.11.8</version>
    </dependency>
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-compiler</artifactId>
        <version>2.11.8</version>
    </dependency>
 <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>

org.apache.spark
spark-core_2.11
2.1.1
org.scala-lang
scala图书馆
2.11.8
org.scala-lang
scala编译器
2.11.8
org.apache.spark
spark-hive_2.11
2.1.1

您尝试了吗?@Ramesh Maharjan在添加.enableHiveSupport()后出现错误:实例化“org.apache.spark.sql.hive.HiveSessionState”时出错。搜索了许多stackoverflow链接,现在看起来像val spSession=SparkSession.builder().master(“local[*]).appName(“appName”).config(“spark.sql.warehouse.dir”,wareHouseLocation)。enableHiveSupport().getOrCreate()仍然出现错误:实例化“org.apache.spark.sql.hive.HiveSessionState”时出错。您在wareHouseLocation中给出了什么?当我没有添加任何内容时,它在日志中显示文件:/Users/myusername/IdeaProjects/myprojectname/spark warehouse/所以我刚刚添加了这个。这是我的系统上的一个位置。你做了什么。‌​warehouse.dir“,”/Users/myusername/IdeaProjects/myprojectname/spark wareh‌​“?