Java 如何按Spark中的字段对数据进行分组?
我想从数据库中读取两列,按第一列对它们进行分组,然后使用Spark将结果插入另一个表中。我的程序是用Java编写的。我尝试了以下方法:Java 如何按Spark中的字段对数据进行分组?,java,apache-spark,cassandra,Java,Apache Spark,Cassandra,我想从数据库中读取两列,按第一列对它们进行分组,然后使用Spark将结果插入另一个表中。我的程序是用Java编写的。我尝试了以下方法: public static void aggregateSessionEvents(org.apache.spark.SparkContext sparkContext) { com.datastax.spark.connector.japi.rdd.CassandraJavaPairRDD<String, String> logs = ja
public static void aggregateSessionEvents(org.apache.spark.SparkContext sparkContext) {
com.datastax.spark.connector.japi.rdd.CassandraJavaPairRDD<String, String> logs = javaFunctions(sparkContext)
.cassandraTable("dove", "event_log", mapColumnTo(String.class), mapColumnTo(String.class))
.select("session_id", "event");
logs.groupByKey();
com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions(logs).writerBuilder("dove", "event_aggregation", null).saveToCassandra();
sparkContext.stop();
}
public static void aggregateSessionEvents(org.apache.spark.SparkContext SparkContext){
com.datastax.spark.connector.japi.rdd.cassandrajavapairdd logs=javaFunctions(sparkContext)
.cassandraTable(“dove”、“事件日志”、mapColumnTo(String.class)、mapColumnTo(String.class))
.选择(“会话id”、“事件”);
logs.groupByKey();
com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions(logs).writerBuilder(“dove”,“event_aggregation”,null).saveToCassandra();
sparkContext.stop();
}
这给了我一个错误:
The method cassandraTable(String, String, RowReaderFactory<T>) in the type SparkContextJavaFunctions is not applicable for the arguments (String, String, RowReaderFactory<String>, mapColumnTo(String.class))
SparkContextJavaFunctions类型中的方法cassandraTable(String,String,RowReaderFactory)不适用于参数(String,String,RowReaderFactory,mapColumnTo(String.class))
我的依赖项是:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>2.0.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.6.2</version>
</dependency>
</dependencies>
org.apache.spark
spark-core_2.10
2.0.1
假如
org.apache.spark
spark-2.10
2.0.1
假如
com.datasax.spark
spark-cassandra-connector_2.10
1.6.2
如何解决此问题?更改此选项:
.cassandraTable("dove", "event_log", mapColumnTo(String.class), mapColumnTo(String.class))
致:
您正在发送额外的参数。要按字段分组数据,请执行以下步骤:
public static void aggregateSessionEvents(SparkContext sparkContext) {
JavaRDD<Data> datas = javaFunctions(sparkContext).cassandraTable("test", "data",
mapRowTo(Data.class));
JavaPairRDD<String, String> pairDatas = datas
.mapToPair(data -> new Tuple2<>(data.getKey(), data.getValue()));
pairDatas.reduceByKey((value1, value2) -> value1 + "," + value2);
sparkContext.stop();
}
publicstaticvoidaggregateSessionEvents(SparkContext SparkContext){
javarddatas=javaFunctions(sparkContext).cassandraTable(“测试”,“数据”,
mapRowTo(Data.class));
javapairdd pairdata=datas
.mapToPair(data->new Tuple2(data.getKey(),data.getValue());
pairDatas.reduceByKey((value1,value2)->value1+“,”+value2);
sparkContext.stop();
}
这给了我一个错误:类型不匹配:无法从CassandraTableScanJavaRDD转换为CassandraJavaPairRDD
public static void aggregateSessionEvents(SparkContext sparkContext) {
JavaRDD<Data> datas = javaFunctions(sparkContext).cassandraTable("test", "data",
mapRowTo(Data.class));
JavaPairRDD<String, String> pairDatas = datas
.mapToPair(data -> new Tuple2<>(data.getKey(), data.getValue()));
pairDatas.reduceByKey((value1, value2) -> value1 + "," + value2);
sparkContext.stop();
}