Apache spark 如何在Spark SQL中查询数组类型的至少1行中是否存在某些内容？_Apache Spark_Apache Spark Sql

Apache spark 如何在Spark SQL中查询数组类型的至少1行中是否存在某些内容？

apache-spark

Apache spark 如何在Spark SQL中查询数组类型的至少1行中是否存在某些内容？,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有以下模式，并希望检索“dbsnpants”字段中至少1个“af”字段小于0.1的所有行 scala> randVarsDF.printSchema root |-- chr: string (nullable = true) |-- pos: long (nullable = true) |-- ref: string (nullable = true) |-- alt: string (nullable = true) |-- dbsnpAnnots: array (nullable

我有以下模式，并希望检索“dbsnpants”字段中至少1个“af”字段小于0.1的所有行

scala> randVarsDF.printSchema
root
|-- chr: string (nullable = true)
|-- pos: long (nullable = true)
|-- ref: string (nullable = true)
|-- alt: string (nullable = true)
|-- dbsnpAnnots: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- af: double (nullable = true)
|    |    |-- common: boolean (nullable = true)
|    |    |-- rsid: string (nullable = true)

我知道如何使用UDF和DataSet api实现这一点，但我也希望能够在SQL中实现这一点

我现在做的是：

select count(*) from RANDVARS where dbsnpAnnots[0].af < 0.1 or dbsnpAnnots[1].af < 0.1 or dbsnpAnnots[2].af < 0.1

不过，这只搜索dbsnpants数组中的前3个元素。我想搜索所有元素，因为可能有3个以上

我也试过了

select count(*) from RANDVARS where dbsnpAnnots[*].af < 0.1

但这不是有效的Spark SQL查询

有什么想法吗？

你需要分解这个数组。因为它是一个结构数组，所以可以使用inline

非常感谢不客气，如果答案解决了您的问题，请接受：

select count(1) 
from (
  select inline(dbsnpAnnots) from RANDVARS 
) p 
where p.af < 0.1