如何使用JavaRDD类通过GROUPBY提取计数?
我想使用JavaRDD类提取如何使用JavaRDD类通过GROUPBY提取计数?,java,mongodb,hadoop,apache-spark,bigdata,Java,Mongodb,Hadoop,Apache Spark,Bigdata,我想使用JavaRDD类提取用户名和计数(每个用户执行每个事件的次数)。如何创建JavaRDD对象 以下是我的数据的快照: { "_id" : ObjectId("57b3e6d1cab823158a06cafe"), "app" : { "clientIp" : "111.0.0.1", "event" : { "event_name" : "MAX_SEARCH",
用户名
和计数
(每个用户执行每个事件的次数)。如何创建JavaRDD对象
以下是我的数据的快照:
{
"_id" : ObjectId("57b3e6d1cab823158a06cafe"),
"app" : {
"clientIp" : "111.0.0.1",
"event" : {
"event_name" : "MAX_SEARCH",
"appId" : 1,
"userName" : "Alex"
}
}
}
预期的结果是:
Alex MAX_SEARCH 5
我如何才能做到这一点?假设您在文本文件中有多条记录,如下所示,并且您希望获得用户名、事件名称和事件计数
{
"_id": ObjectId("57b3e6d1cab823158a06cafe"),
"app": {
"clientIp": "111.0.0.1",
"event": {
"event_name": "MAX_SEARCH",
"appId": 1,
"userName": "Alex"
}
}
},
{
"_id": ObjectId("57b3e6d1cab823158a06cafe"),
"app": {
"clientIp": "111.0.0.1",
"event": {
"event_name": "MAX_SEARCH",
"appId": 1,
"userName": "Alex"
}
}
}
{
"_id": ObjectId("57b3e6d1cab823158a01cafe"),
"app": {`enter code here`
"clientIp": "111.0.0.1",
"event": {
"event_name": "MAX_SEARCH",
"appId": 1,
"userName": "Hokam"
}
}
},
{
"_id": ObjectId("57b3e6d1cab823158a02cafe"),
"app": {
"clientIp": "111.0.0.1",
"event": {
"event_name": "MIN_SEARCH",
"appId": 1,
"userName": "Hokam"
}
}
}
下面的代码片段帮助您从上述文件中读取数据,从中创建rdd并生成预期结果
import net.minidev.json.JSONObject;
import net.minidev.json.JSONValue;
SparkConf conf = new SparkConf().setAppName("UserEventLogger").setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
String fileData = FileUtils.readFileToString(new File("/data/pocs/text-file.json"));
List<JSONObject> jsonObject = (List<JSONObject>) JSONValue.parse("[" + fileData + "]");
JavaRDD<JSONObject> jsonRdd = sc.parallelize(jsonObject);
jsonRdd.mapToPair(new PairFunction<JSONObject, String, Integer>() {
@Override
public Tuple2<String, Integer> call(JSONObject appObj) throws Exception {
JSONObject app = (JSONObject) appObj.get("app");
JSONObject event = ((JSONObject) app.get("event"));
String username = event.getAsString("userName");
String eventName = event.getAsString("event_name");
return new Tuple2<String, Integer>(username + " " + eventName, 1);
}
}).reduceByKey(new Function2<Integer, Integer, Integer>() {
@Override
public Integer call(Integer v1, Integer v2) throws Exception {
return v1 + v2;
}
}).foreach(new VoidFunction<Tuple2<String, Integer>>() {
@Override
public void call(Tuple2<String, Integer> t) throws Exception {
System.out.println(t._1 + " " + t._2);
}
});
sc.stop();
Hokam MAX_SEARCH 1
Alex MAX_SEARCH 2
Hokam MIN_SEARCH 1