Java Spark-流数据帧/数据集不支持非基于时间的窗口;

Java Spark-流数据帧/数据集不支持非基于时间的窗口;,java,apache-spark,apache-spark-sql,spark-streaming,Java,Apache Spark,Apache Spark Sql,Spark Streaming,我需要编写带有内部选择和分区的Spark sql查询。问题是我有一个例外。 我已经花了几个小时在这个问题上,但用其他方法我没有成功 例外情况: Exception in thread "main" org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;; Window [sum(cast(_w0#41 as bigint))

我需要编写带有内部选择和分区的Spark sql查询。问题是我有一个例外。 我已经花了几个小时在这个问题上,但用其他方法我没有成功

例外情况:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Non-time-based windows are not supported on streaming DataFrames/Datasets;;
Window [sum(cast(_w0#41 as bigint)) windowspecdefinition(deviceId#28, timestamp#30 ASC NULLS FIRST, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS grp#34L], [deviceId#28], [timestamp#30 ASC NULLS FIRST]
+- Project [currentTemperature#27, deviceId#28, status#29, timestamp#30, wantedTemperature#31, CASE WHEN (status#29 = cast(false as boolean)) THEN 1 ELSE 0 END AS _w0#41]
我假设这是一个太复杂的查询,无法像这样实现。但我不知道如何修复它

 SparkSession spark = SparkUtils.getSparkSession("RawModel");

 Dataset<RawModel> datasetMap = readFromKafka(spark);

 datasetMap.registerTempTable("test");

 Dataset<Row> res = datasetMap.sqlContext().sql("" +
                " select deviceId, grp, avg(currentTemperature) as averageT, min(timestamp) as minTime ,max(timestamp) as maxTime, count(*) as countFrame " +
                " from (select test.*,  sum(case when status = 'false' then 1 else 0 end) over (partition by deviceId order by timestamp) as grp " +
                "  from test " +
                "  ) test " +
                " group by deviceid, grp ");
如有任何建议,将不胜感激。
谢谢。

我认为问题在于窗口规范:

over (partition by deviceId order by timestamp) 
分区需要在一个基于时间的列上——在您的例子中是时间戳。以下方面应起作用:

over (partition by timestamp order by timestamp) 
当然,这并不能解决你的问题。可以尝试以下操作:但不清楚spark是否会支持:

over (partition by timestamp, deviceId order by timestamp) 
即使spark确实支持这一点,它仍然会更改查询的语义

更新

这里有一个明确的来源:来自如来达斯,他是spark streaming的关键/核心提交者:


我也犯了同样的错误,你有没有找到解决办法,我没有。我从一开始就用不同的方法实现。我用的是自定义聚合,你是说,自定义聚合?