Java 如何让ApacheSpark忽略查询中的点?
给定以下JSON文件:Java 如何让ApacheSpark忽略查询中的点?,java,json,apache-spark,Java,Json,Apache Spark,给定以下JSON文件: [{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}] 为什么此Java代码失败: DataFrame df = sqlContext.read().json("dogfile.json"); df.groupBy("dog.woof").count().show(); 但这并不是: DataFrame df = sqlContext.read().json("dogfile.json"); df.groupBy("dog
[{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}]
为什么此Java代码失败:
DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog.woof").count().show();
但这并不是:
DataFrame df = sqlContext.read().json("dogfile.json");
df.groupBy("dog*woof").count().show();
这是失败的一个片段:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'dog.woof' given input columns: [dog*woof, dog.woof];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:335)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:334)
...
它失败是因为点用于访问
struct
字段的属性。可以使用反勾号转义列名:
val df = sqlContext.read.json(sc.parallelize(Seq(
"""{"dog*woof":"bad dog 1","dog.woof":"bad dog 32"}"""
)))
df.groupBy("`dog.woof`").count.show
// +----------+-----+
// | dog.woof|count|
// +----------+-----+
// |bad dog 32| 1|
// +----------+-----+
但是,在名称中使用特殊字符并不是一种好的做法,一般情况下都无法使用