如何传递复杂的外部变量,例如map';从Spark with Java中的驱动程序到UDF的值是多少?
当我需要将Java hashmap传递给UDF时,我遇到了一个很大的问题,UDF本身被定义为一个单独的类,而不是一些内联lambda函数,它可以访问定义为广播变量的封闭范围的变量。我在这里开始这个问题也是为了这个目的: 没有提供令人满意的答案,因为人们只向我提供包含简单UDF的答案,这些UDF可以定义为小lambda,因此可以从驱动程序访问广播变量如何传递复杂的外部变量,例如map';从Spark with Java中的驱动程序到UDF的值是多少?,java,scala,apache-spark,Java,Scala,Apache Spark,当我需要将Java hashmap传递给UDF时,我遇到了一个很大的问题,UDF本身被定义为一个单独的类,而不是一些内联lambda函数,它可以访问定义为广播变量的封闭范围的变量。我在这里开始这个问题也是为了这个目的: 没有提供令人满意的答案,因为人们只向我提供包含简单UDF的答案,这些UDF可以定义为小lambda,因此可以从驱动程序访问广播变量 正如我在另一个问题中所详述的那样,我开始研究typedlits,在我看来这是前进的方向,但是Java中几乎没有关于这个方法的文档,尽管Scala中
正如我在另一个问题中所详述的那样,我开始研究typedlits,在我看来这是前进的方向,但是Java中几乎没有关于这个方法的文档,尽管Scala中也有关于这个方法的示例和教程。因此,我的问题是如何使用typedlit将复杂变量的值传递给UDF?我通过一条漫长而艰难的途径找到了这个问题的答案,并将其发布在这里,以帮助其他可能面临同样问题的人 官方Spark Javadocs给出了typedLit方法定义,如下所示:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
为了在我的Java Maven项目中使用此对象,我遵循了本博客给出的结构:
我必须在pom中包含的依赖项如下:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
我定义了一个虚拟映射以发送到我的UDF:
Map<String, String> testMap = new HashMap<>();
testMap.put("1", "One");
我无法将MapString val发送到UDF,因为编译器总是抱怨它在TypeDefs中具有私有访问权限。通过链接,我发现在Java中,val是通过方法调用(如getter)访问的,而不是直接通过val本身
TestUDF I定义如下:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
公共类TestUDF实现UDF1{
@凌驾
公共字符串调用(scala.collection.immutable.Map t1)引发异常{
//TODO自动生成的方法存根
系统输出打印项次(t1);
AsJava asJavaMap=JavaConverters.mapAsJavaMapConverter(t1);
Map javaMap=asJavaMap.asJava();
System.out.println(“1的值:+javaMap.get(“1”));
返回null;
}
}
这终于奏效了,我可以从我的UDF访问地图。我通过一条漫长而艰难的道路找到了这个问题的答案,我将此贴在这里,作为对其他可能面临同样问题的人的帮助 官方Spark Javadocs给出了typedLit方法定义,如下所示:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
为了在我的Java Maven项目中使用此对象,我遵循了本博客给出的结构:
我必须在pom中包含的依赖项如下:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
我定义了一个虚拟映射以发送到我的UDF:
Map<String, String> testMap = new HashMap<>();
testMap.put("1", "One");
我无法将MapString val发送到UDF,因为编译器总是抱怨它在TypeDefs中具有私有访问权限。通过链接,我发现在Java中,val是通过方法调用(如getter)访问的,而不是直接通过val本身
TestUDF I定义如下:
typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.7</version>
</dependency>
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.2</version>
<executions>
<execution>
<id>compile</id>
<goals>
<goal>compile</goal>
</goals>
<phase>compile</phase>
</execution>
<execution>
<id>test-compile</id>
<goals>
<goal>testCompile</goal>
</goals>
<phase>test-compile</phase>
</execution>
<execution>
<phase>process-resources</phase>
<goals>
<goal>compile</goal>
</goals>
</execution>
</executions>
</plugin>
public class TestUDF implements UDF1<scala.collection.immutable.Map<String, String>,String> {
@Override
public String call(scala.collection.immutable.Map<String, String> t1) throws Exception {
// TODO Auto-generated method stub
System.out.println(t1);
AsJava<Map<String, String>> asJavaMap = JavaConverters.mapAsJavaMapConverter(t1);
Map<String, String> javaMap = asJavaMap.asJava();
System.out.println("Value of 1: " + javaMap.get("1"));
return null;
}
公共类TestUDF实现UDF1{
@凌驾
公共字符串调用(scala.collection.immutable.Map t1)引发异常{
//TODO自动生成的方法存根
系统输出打印项次(t1);
AsJava asJavaMap=JavaConverters.mapAsJavaMapConverter(t1);
Map javaMap=asJavaMap.asJava();
System.out.println(“1的值:+javaMap.get(“1”));
返回null;
}
}
这终于奏效了,我可以从我的UDF访问地图。这帮了我大忙,thnx。但由于scala版本的不兼容性,我遇到了一个问题。UDF接收了<代码>不可更改的映射,但是定义的<代码> SCALAMAP <代码>是一个<代码>可变的<代码>,所以我在将来的人们面临一个问题时,考虑一下这个场景对我帮助很大,THNX。但由于scala版本的不兼容性,我遇到了一个问题。UDF接收了<代码>不可更改的 map,但是定义的<代码> SCALAMAP <代码>是一个<代码>可变的 >所以我在将来的人面临一个问题时,考虑一下这个场景。