使用JavaRdd映射Spark数据帧Colunm值<;世界其他地区>;
我从Sqlcontext创建了两个数据帧使用JavaRdd映射Spark数据帧Colunm值<;世界其他地区>;,java,apache-spark,spark-dataframe,Java,Apache Spark,Spark Dataframe,我从Sqlcontext创建了两个数据帧 DataFrame edge_dataframe = SharedSC.getEdgeDataFrame("EDGE_RDD", -1234, sc.getSparkContext()); DataFrame vertex_dataframe = SharedSC.getVertexDataFrame("VERTEX_RDD", -1234, sc.getSparkContext()); 顶点数据帧
DataFrame edge_dataframe = SharedSC.getEdgeDataFrame("EDGE_RDD", -1234, sc.getSparkContext());
DataFrame vertex_dataframe = SharedSC.getVertexDataFrame("VERTEX_RDD", -1234, sc.getSparkContext());
- 顶点数据帧
- 边架
JavaRDD<Row> ff = vertex_dataframe.javaRDD().zipWithIndex().map(new SerialiFunJRdd<Tuple2<Row, Long>, Row>() {
public Row call(Tuple2<Row, Long> rowLongTuple2) throws Exception {
return RowFactory.create(rowLongTuple2._1().getString(0), rowLongTuple2._2());
}
});
JavaRDD ff=vertex_dataframe.JavaRDD().zipWithIndex().map(新的SerialiFunJRdd(){
公共行调用(Tuple2 rowLongTuple2)引发异常{
返回RowFactory.create(rowLongTuple2._1().getString(0),rowLongTuple2._2());
}
});
现在,我想将edge DataFrame src和dest列更改为长ID。我该如何做。请提前提供帮助。我使用List进行了更改。这可能不是最好的方法。但它解决了我的问题 首先,我将“JavaRdd行”映射到JavaRdd Tuple2
JavaRDD<Tuple2<java.lang.Long,String>> vertex_javardd = ff.map(new SerializableFunction<Row, Tuple2<java.lang.Long, String>>() {
public Tuple2<java.lang.Long, String> call(Row row) throws Exception {return new Tuple2<java.lang.Long,String(row.getLong(1),row.getString(0));}});
JavaRDD vertex\u JavaRDD=ff.map(新的SerializableFunction(){
public Tuple2调用(Row-Row)抛出异常{return new Tuple2我是用List做的。这可能不是最好的方法。但它解决了我的问题
首先,我将“JavaRdd行”映射到JavaRdd Tuple2
JavaRDD<Tuple2<java.lang.Long,String>> vertex_javardd = ff.map(new SerializableFunction<Row, Tuple2<java.lang.Long, String>>() {
public Tuple2<java.lang.Long, String> call(Row row) throws Exception {return new Tuple2<java.lang.Long,String(row.getLong(1),row.getString(0));}});
JavaRDD vertex\u JavaRDD=ff.map(新的SerializableFunction(){
公共Tuple2调用(行)引发异常{返回新的Tuple2