Apache spark 通过Pyspark查询配置单元返回空结果

Apache spark 通过Pyspark查询配置单元返回空结果,apache-spark,hive,pyspark,Apache Spark,Hive,Pyspark,我在AWS EMR集群上运行spark 2.1.0(基于以下-) 我试图查询一个存在的表,该表在远程配置单元中包含数据。请正确输入架构,但表内容为空。有什么想法吗 import os import findspark findspark.init('/usr/lib/spark/') # Spark related imports from pyspark.sql import SparkSession from pyspark import SparkContext sc = SparkC

我在AWS EMR集群上运行spark 2.1.0(基于以下-)

我试图查询一个存在的表,该表在远程配置单元中包含数据。请正确输入架构,但表内容为空。有什么想法吗

import os
import findspark
findspark.init('/usr/lib/spark/')

# Spark related imports
from pyspark.sql import SparkSession
from pyspark import SparkContext

sc = SparkContext.getOrCreate()
spark = SparkSession.builder.config(conf=sc.getConf()).getOrCreate()

remote_hive = "jdbc:hive2://myhost:10000/mydb"
driver = "org.apache.hive.jdbc.HiveDriver"
user="user"
password = "password"

df = spark.read.format("jdbc").\
    options(url=remote_hive, 
            driver=driver, 
            user=user, 
            password=password, 
            dbtable="mytable").load()


df.printSchema()
# returns the right schema
df.count()
0

你能试试吗

spark\
  .read.format("jdbc")\
  .option("driver", driver)
  .option("url", remote_url)
  .option("dbtable", "mytable")
  .option("user", "user")\
  .option("password", "password")
  .load()
你能试试吗

spark\
  .read.format("jdbc")\
  .option("driver", driver)
  .option("url", remote_url)
  .option("dbtable", "mytable")
  .option("user", "user")\
  .option("password", "password")
  .load()
相同结果-空表(df.count()=0)相同结果-空表(df.count()=0)