Scala 分区表中缺少spark分区列
我正在使用数据源在HDFS中创建一个分区拼花地板文件 数据源如下所示:Scala 分区表中缺少spark分区列,scala,apache-spark,Scala,Apache Spark,我正在使用数据源在HDFS中创建一个分区拼花地板文件 数据源如下所示: scala> sqlContext.sql("select * from parquetFile").show() +--------+-----------------+ |area_tag| vin| +--------+-----------------+ | 0|LSKG5GC19BA210794| | 0|LSKG5GC15BA210372| |
scala> sqlContext.sql("select * from parquetFile").show()
+--------+-----------------+
|area_tag| vin|
+--------+-----------------+
| 0|LSKG5GC19BA210794|
| 0|LSKG5GC15BA210372|
| 0|LSKG5GC18BA210107|
| 0|LSKG4GC16BA211971|
| 0|LSKG4GC19BA210233|
| 0|LSKG5GC17BA210017|
| 0|LSKG4GC19BA211785|
| 0|LSKG4GC15BA210004|
| 0|LSKG4GC12BA211739|
| 0|LSKG4GC18BA210238|
| 0|LSKG4GC13BA210261|
| 0|LSKG5GC16BA210106|
| 0|LSKG4GC1XBA210287|
| 0|LSKG4GC10BA210265|
| 0|LSKG5GC10CA210118|
| 0|LSKG5GC16BA212289|
| 0|LSKG5GC1XBA211016|
| 0|LSKG5GC15CA210194|
| 0|LSKG5GC12CA210119|
| 0|LSKG4GC19BA211379|
+--------+-----------------+
我使用以下命令创建分区(我是在spark shell中完成的):
当我通过从分区表加载来打印数据时,它显示:
scala> p1.show()
+--------+-----------------+
|area_tag| vin|
+--------+-----------------+
| |LSKG5GC19BA210794|
| |LSKG5GC15BA210372|
| |LSKG5GC18BA210107|
| |LSKG4GC16BA211971|
| |LSKG4GC19BA210233|
| |LSKG5GC17BA210017|
| |LSKG4GC19BA211785|
| |LSKG4GC15BA210004|
| |LSKG4GC12BA211739|
| |LSKG4GC18BA210238|
| |LSKG4GC13BA210261|
| |LSKG5GC16BA210106|
| |LSKG4GC1XBA210287|
| |LSKG4GC10BA210265|
| |LSKG5GC10CA210118|
| |LSKG5GC16BA212289|
| |LSKG5GC1XBA211016|
| |LSKG5GC15CA210194|
| |LSKG5GC12CA210119|
| |LSKG4GC19BA211379|
+--------+-----------------+
only showing top 20 rows
缺少分区列。列发生了什么,是一个错误吗?为什么在输出中包括分区列,而不是
从parquetFile(其中area\u tag=0
)中选择vin?因为我需要执行SQL,比如从p1逐区域标记中选择count(*)、area\u tag。
scala> p1.show()
+--------+-----------------+
|area_tag| vin|
+--------+-----------------+
| |LSKG5GC19BA210794|
| |LSKG5GC15BA210372|
| |LSKG5GC18BA210107|
| |LSKG4GC16BA211971|
| |LSKG4GC19BA210233|
| |LSKG5GC17BA210017|
| |LSKG4GC19BA211785|
| |LSKG4GC15BA210004|
| |LSKG4GC12BA211739|
| |LSKG4GC18BA210238|
| |LSKG4GC13BA210261|
| |LSKG5GC16BA210106|
| |LSKG4GC1XBA210287|
| |LSKG4GC10BA210265|
| |LSKG5GC10CA210118|
| |LSKG5GC16BA212289|
| |LSKG5GC1XBA211016|
| |LSKG5GC15CA210194|
| |LSKG5GC12CA210119|
| |LSKG4GC19BA211379|
+--------+-----------------+
only showing top 20 rows