Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/elixir/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 如何从组中选择第一条记录?_Java_Apache Spark_Apache Spark Sql - Fatal编程技术网

Java 如何从组中选择第一条记录?

Java 如何从组中选择第一条记录?,java,apache-spark,apache-spark-sql,Java,Apache Spark,Apache Spark Sql,我有一个记录列表/数组,我正在使用explode从列表中提取数据。我想使用Java中的Spark SQL从分解结果中选择第一条记录 Dataset<Row> ds= ds.select( json.col("*"), explode(json.col("records.record.newrecord")).as("newrecord")); ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("even

我有一个记录列表/数组,我正在使用
explode
从列表中提取数据。我想使用Java中的Spark SQL从分解结果中选择第一条记录

Dataset<Row> ds= ds.select(
  json.col("*"), 
  explode(json.col("records.record.newrecord")).as("newrecord"));
ds= ds.select(ds.col("EVENT_SEQ"), ds.col("newrecord").apply("event").as("EVENTTYPE")); 
要求:

|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

我看过一些文档,其中建议使用
Cloumn.apply
,但我还没有找到足够的帮助让我开始使用它。

这当然是
groupBy
操作符的
第一个
功能

val ds = Seq(
  ("5a694d77-bc65-4bf...", 0),
  ("5a694d77-bc65-4bf...", 0)
).toDF("EVENT_SEQ", "EVENTTYPE")
scala> ds.show
+--------------------+---------+
|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show
+--------------------+-----------------------+
|           EVENT_SEQ|first(EVENTTYPE, false)|
+--------------------+-----------------------+
|5a694d77-bc65-4bf...|                      0|
+--------------------+-----------------------+
val ds = Seq(
  ("5a694d77-bc65-4bf...", 0),
  ("5a694d77-bc65-4bf...", 0)
).toDF("EVENT_SEQ", "EVENTTYPE")
scala> ds.show
+--------------------+---------+
|           EVENT_SEQ|EVENTTYPE|
+--------------------+---------+
|5a694d77-bc65-4bf...|        0|
|5a694d77-bc65-4bf...|        0|
+--------------------+---------+

scala> ds.groupBy("EVENT_SEQ").agg(first("EVENTTYPE")).show
+--------------------+-----------------------+
|           EVENT_SEQ|first(EVENTTYPE, false)|
+--------------------+-----------------------+
|5a694d77-bc65-4bf...|                      0|
+--------------------+-----------------------+