Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 分区表中缺少spark分区列_Scala_Apache Spark - Fatal编程技术网

Scala 分区表中缺少spark分区列

Scala 分区表中缺少spark分区列,scala,apache-spark,Scala,Apache Spark,我正在使用数据源在HDFS中创建一个分区拼花地板文件 数据源如下所示: scala> sqlContext.sql("select * from parquetFile").show() +--------+-----------------+ |area_tag| vin| +--------+-----------------+ | 0|LSKG5GC19BA210794| | 0|LSKG5GC15BA210372| |

我正在使用数据源在HDFS中创建一个分区拼花地板文件

数据源如下所示:

scala> sqlContext.sql("select * from parquetFile").show()
+--------+-----------------+
|area_tag|              vin|
+--------+-----------------+
|       0|LSKG5GC19BA210794|
|       0|LSKG5GC15BA210372|
|       0|LSKG5GC18BA210107|
|       0|LSKG4GC16BA211971|
|       0|LSKG4GC19BA210233|
|       0|LSKG5GC17BA210017|
|       0|LSKG4GC19BA211785|
|       0|LSKG4GC15BA210004|
|       0|LSKG4GC12BA211739|
|       0|LSKG4GC18BA210238|
|       0|LSKG4GC13BA210261|
|       0|LSKG5GC16BA210106|
|       0|LSKG4GC1XBA210287|
|       0|LSKG4GC10BA210265|
|       0|LSKG5GC10CA210118|
|       0|LSKG5GC16BA212289|
|       0|LSKG5GC1XBA211016|
|       0|LSKG5GC15CA210194|
|       0|LSKG5GC12CA210119|
|       0|LSKG4GC19BA211379|
+--------+-----------------+
我使用以下命令创建分区(我是在spark shell中完成的):

当我通过从分区表加载来打印数据时,它显示:

scala> p1.show()
+--------+-----------------+
|area_tag|              vin|
+--------+-----------------+
|        |LSKG5GC19BA210794|
|        |LSKG5GC15BA210372|
|        |LSKG5GC18BA210107|
|        |LSKG4GC16BA211971|
|        |LSKG4GC19BA210233|
|        |LSKG5GC17BA210017|
|        |LSKG4GC19BA211785|
|        |LSKG4GC15BA210004|
|        |LSKG4GC12BA211739|
|        |LSKG4GC18BA210238|
|        |LSKG4GC13BA210261|
|        |LSKG5GC16BA210106|
|        |LSKG4GC1XBA210287|
|        |LSKG4GC10BA210265|
|        |LSKG5GC10CA210118|
|        |LSKG5GC16BA212289|
|        |LSKG5GC1XBA211016|
|        |LSKG5GC15CA210194|
|        |LSKG5GC12CA210119|
|        |LSKG4GC19BA211379|
+--------+-----------------+
only showing top 20 rows

缺少分区列。列发生了什么,是一个错误吗?

为什么在输出中包括分区列,而不是
从parquetFile(其中area\u tag=0
)中选择vin?因为我需要执行SQL,比如从p1逐区域标记中选择count(*)、area\u tag。
scala> p1.show()
+--------+-----------------+
|area_tag|              vin|
+--------+-----------------+
|        |LSKG5GC19BA210794|
|        |LSKG5GC15BA210372|
|        |LSKG5GC18BA210107|
|        |LSKG4GC16BA211971|
|        |LSKG4GC19BA210233|
|        |LSKG5GC17BA210017|
|        |LSKG4GC19BA211785|
|        |LSKG4GC15BA210004|
|        |LSKG4GC12BA211739|
|        |LSKG4GC18BA210238|
|        |LSKG4GC13BA210261|
|        |LSKG5GC16BA210106|
|        |LSKG4GC1XBA210287|
|        |LSKG4GC10BA210265|
|        |LSKG5GC10CA210118|
|        |LSKG5GC16BA212289|
|        |LSKG5GC1XBA211016|
|        |LSKG5GC15CA210194|
|        |LSKG5GC12CA210119|
|        |LSKG4GC19BA211379|
+--------+-----------------+
only showing top 20 rows