Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 拼花地板字段在通过蜂巢读取时显示为空,但在通过spark读取时显示值_Apache Spark_Apache Spark Sql_Spark Structured Streaming - Fatal编程技术网

Apache spark 拼花地板字段在通过蜂巢读取时显示为空,但在通过spark读取时显示值

Apache spark 拼花地板字段在通过蜂巢读取时显示为空,但在通过spark读取时显示值,apache-spark,apache-spark-sql,spark-structured-streaming,Apache Spark,Apache Spark Sql,Spark Structured Streaming,我正在HDFS中以拼花文件的形式编写spark流数据帧。我已经在HDFS位置的顶部创建了配置单元表。 我的spark结构化流式写入命令如下: parquet_frame.writeStream.option("compression", "none").option("latestFirst", "true").option("startingOffsets", "latest").option("checkpointLocation", "/user/ddd/openareacheckp

我正在HDFS中以拼花文件的形式编写spark流数据帧。我已经在HDFS位置的顶部创建了配置单元表。 我的spark结构化流式写入命令如下:

   parquet_frame.writeStream.option("compression", "none").option("latestFirst", "true").option("startingOffsets", "latest").option("checkpointLocation", "/user/ddd/openareacheckpoint_feb/").outputMode("append").trigger(Trigger.ProcessingTime("10 seconds")).partitionBy("dfo_data_dt").format("parquet").option("path", "hdfs://ddd/apps/hive/warehouse/ddddd.db/frg_drag/").start().awaitTermination()
        CREATE TABLE `ddddd.frg_drag`(
     `unit` string,
     `pol` string,
     `lop` string,
     `gok` string,
     `dfo_call_group` string,
     `dfo_dfr` double,
     `dfo_dfrs` double,
     `dfo_dfrf` double,
     `dfo_dfra` double,
     `dfo_dfrgg` double,
     `dfo_dfrqq` double,
     `dfo_w_percent` double,
     `dfo_afv_percent` double,
     `dfo_endfd` double,
     `dfo_time` timestamp,
     `dfo_data_hour` int,
     `dfo_data_minute` int)
   PARTITIONED BY (
     `dfo_data_dt` bigint)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://ddd/apps/hive/warehouse/ddddd.db/frg_drag'
   TBLPROPERTIES (
      'transient_lastDdlTime'='1551108381')
如果我试图从配置单元中读取数据,那么对于双精度数据类型,除了string和BIGINT之外的INT数据类型,我会得到NULL

但是我通过spark shell读取的是同一个HDFS文件,它得到的生成值没有任何空值。 spark中读取拼花地板文件的命令:

val pp = spark.read.parquet("hdfs://ddd/apps/hive/warehouse/ddddd.db/frg_drag/dfo_data_dt=20190225/")
   pp.show
我在配置单元中的create table语句如下所示:

   parquet_frame.writeStream.option("compression", "none").option("latestFirst", "true").option("startingOffsets", "latest").option("checkpointLocation", "/user/ddd/openareacheckpoint_feb/").outputMode("append").trigger(Trigger.ProcessingTime("10 seconds")).partitionBy("dfo_data_dt").format("parquet").option("path", "hdfs://ddd/apps/hive/warehouse/ddddd.db/frg_drag/").start().awaitTermination()
        CREATE TABLE `ddddd.frg_drag`(
     `unit` string,
     `pol` string,
     `lop` string,
     `gok` string,
     `dfo_call_group` string,
     `dfo_dfr` double,
     `dfo_dfrs` double,
     `dfo_dfrf` double,
     `dfo_dfra` double,
     `dfo_dfrgg` double,
     `dfo_dfrqq` double,
     `dfo_w_percent` double,
     `dfo_afv_percent` double,
     `dfo_endfd` double,
     `dfo_time` timestamp,
     `dfo_data_hour` int,
     `dfo_data_minute` int)
   PARTITIONED BY (
     `dfo_data_dt` bigint)
   ROW FORMAT SERDE
     'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
   STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
   OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
   LOCATION
     'hdfs://ddd/apps/hive/warehouse/ddddd.db/frg_drag'
   TBLPROPERTIES (
      'transient_lastDdlTime'='1551108381')

你能帮我解决这个问题吗。我是新来的火花世界

可能的复制品我们有什么解决方案吗?我找不到sharedI尝试过的链接(“spark.sql.parquet.writeLegacyFormat”,“true”),但它仍然无法工作,这可能与我们有任何解决方案吗?我找不到sharedI尝试使用的链接(“spark.sql.parquet.writeLegacyFormat”、“true”),但它仍然不起作用