Apache spark Spark-如何对SchemaRDD进行汇总统计?

Apache spark Spark-如何对SchemaRDD进行汇总统计?,apache-spark,Apache Spark,我想计算每个用户日志计数的汇总统计数据,我使用的RDD来自: val fileRdd = sc.textFile("s3n://<bucket>/project/20141215/log_type1/log_type1.*.gz") val jsonRdd = sqlContext.jsonRDD(fileRdd) rdd.registerTempTable("log_type1") val result = sqlContext.sql("SELECT user_id, COUN

我想计算每个用户日志计数的汇总统计数据,我使用的RDD来自:

val fileRdd = sc.textFile("s3n://<bucket>/project/20141215/log_type1/log_type1.*.gz")
val jsonRdd = sqlContext.jsonRDD(fileRdd)

rdd.registerTempTable("log_type1")
val result = sqlContext.sql("SELECT user_id, COUNT(*) AS the_count FROM log_type1 GROUP BY user_id ORDER BY the_count DESC")
我怎样才能做到这一点?Spark的API中似乎没有类似于
MapRDD
的东西

Mean: 3.245 (user-id-abcdef)
Min: 1 (user-id-mmmnnnkkk)
Median: 15 (user-id-xyzrpg)
Max: 950 (user-id-123456789)