在Spark中将scala Flatten转换为在Java中使用

在Spark中将scala Flatten转换为在Java中使用,java,scala,apache-spark,Java,Scala,Apache Spark,我一直在用Java实现Function1,让UDF将Seq[Map[String,Int]]. Scala Code : UDF1 joinMap = udf { values: Seq[Map[String,Int]] => values.flatten.toMap } spark DF模式 root |-- rid: integer (nullable = true) |-- lid: integer (nullable = true) |-- mapArray: array

我一直在用Java实现Function1,让UDF将
Seq[Map[String,Int]].

Scala Code :  UDF1 joinMap = udf { values: Seq[Map[String,Int]] => values.flatten.toMap }
spark DF模式

root
 |-- rid: integer (nullable = true)
 |-- lid: integer (nullable = true)
 |-- mapArray: array (nullable = true)
 |    |-- element: map (containsNull = true)
 |    |    |-- key: double
 |    |    |-- value: integer (valueContainsNull = true)
如何在Java中实现类似的UDF

Java代码:

 UDF1 mode1 = new UDF1<WrappedArray<Map<Double, Integer>>, String>() {
  @Override
  public String call(WrappedArray<Map<Double, Integer>> maps) throws Exception {

    List<Map<Double, Integer>> lis = (List<Map<Double, Integer>>) JavaConverters.seqAsJavaListConverter(maps).asJava();

    System.out.println(lis.get(1));

    java.util.Map<Double,Integer> a= lis.stream().flatMap(map -> map.entrySet().stream())
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
    return "";
  }


};

将Seq[T]转换为List[T]时,Java转换器只会将Seq转换为Java列表,而不会将Seq的对象转换为与Java兼容的对象。也就是说,转换的Java列表(Seq)的每个对象也必须转换为Java兼容的对象

在上面的例子中,scala.collection.immutable.Map到java.util.Map

import scala.collection.JavaConverters;
import scala.collection.JavaConversions;

UDF1 mode1 = new UDF1<WrappedArray<scala.collection.immutable.Map<Double, Integer>>, Map<Double, Integer>>() {
        @Override
        public Map<Double, Integer> call(WrappedArray<scala.collection.immutable.Map<Double, Integer>> maps) throws Exception {
            List<scala.collection.immutable.Map<Double, Integer>> lis = JavaConverters.seqAsJavaListConverter(maps).asJava();
            Map<Double, Integer> flattenMap = new HashMap<>();
            lis.forEach(map -> {
                Map<Double, Integer> m = JavaConversions.mapAsJavaMap((scala.collection.immutable.Map) map);
                flattenMap.putAll(m);
            });
            return flattenMap;
        }
    };
导入scala.collection.JavaConverters;
导入scala.collection.JavaConversions;
UDF1模式1=新UDF1(){
@凌驾
公共映射调用(WrappedArray映射)引发异常{
List lis=JavaConverters.seqasjavalistcverter(maps).asJava();
Map flattmap=newhashmap();
lis.forEach(地图->{
Map m=JavaConversions.mapAsJavaMap((scala.collection.immutable.Map)Map);
普塔尔(m);
});
返回平面图;
}
};

此代码具有未选中的分配警告,该警告可能已修复

将Seq[T]转换为List[T]时,Java转换器只会将Seq转换为Java列表,而不会将Seq的对象转换为与Java兼容的对象。也就是说,转换的Java列表(Seq)的每个对象也必须转换为Java兼容的对象

在上面的例子中,scala.collection.immutable.Map到java.util.Map

import scala.collection.JavaConverters;
import scala.collection.JavaConversions;

UDF1 mode1 = new UDF1<WrappedArray<scala.collection.immutable.Map<Double, Integer>>, Map<Double, Integer>>() {
        @Override
        public Map<Double, Integer> call(WrappedArray<scala.collection.immutable.Map<Double, Integer>> maps) throws Exception {
            List<scala.collection.immutable.Map<Double, Integer>> lis = JavaConverters.seqAsJavaListConverter(maps).asJava();
            Map<Double, Integer> flattenMap = new HashMap<>();
            lis.forEach(map -> {
                Map<Double, Integer> m = JavaConversions.mapAsJavaMap((scala.collection.immutable.Map) map);
                flattenMap.putAll(m);
            });
            return flattenMap;
        }
    };
导入scala.collection.JavaConverters;
导入scala.collection.JavaConversions;
UDF1模式1=新UDF1(){
@凌驾
公共映射调用(WrappedArray映射)引发异常{
List lis=JavaConverters.seqasjavalistcverter(maps).asJava();
Map flattmap=newhashmap();
lis.forEach(地图->{
Map m=JavaConversions.mapAsJavaMap((scala.collection.immutable.Map)Map);
普塔尔(m);
});
返回平面图;
}
};

此代码具有未选中的分配警告,该警告可能已修复

为什么要为它编写udf?您可以使用@koiralo I want将地图合并为一个。@koiralo spark function
flatte
只能用于数组而不是地图数组为什么要为其编写自定义项?您可以使用@koiralo I want将地图合并为一个。@koiralo spark函数
flatte
只能用于阵列而不是地图阵列