如何提取RDD';s数据到Java ArrayList?

如何提取RDD';s数据到Java ArrayList?,java,scala,apache-spark,bigdata,rdd,Java,Scala,Apache Spark,Bigdata,Rdd,显而易见的想法是添加元素 ArrayList<String> myvalues = new ArrayList<String>(); myRdd.foreach(new VoidFunction<org.apache.spark.sql.api.java.Row>() { @Override public void call(org.apache.spark.sql.api.java.Row row) throws Exception {

显而易见的想法是添加元素

ArrayList<String> myvalues = new ArrayList<String>();

myRdd.foreach(new VoidFunction<org.apache.spark.sql.api.java.Row>() {
    @Override
    public void call(org.apache.spark.sql.api.java.Row row) throws Exception {
        myvalues.add(row.getString(0); // Say I need only first element
    }
});
ArrayList myvalues=new ArrayList();
myRdd.foreach(新的VoidFunction(){
@凌驾
公共void调用(org.apache.spark.sql.api.java.Row)引发异常{
myvalues.add(row.getString(0);//假设我只需要第一个元素
}
});
这一点,以及其他替代方案一直在抛出org.apache.spark.SparkException:Task not serializable。我进一步简化了函数。显然,我在做一些不合逻辑的事情:-

LOG.info("Let's see..");
queryRdd.foreach(new VoidFunction<org.apache.spark.sql.api.java.Row>() {
  @Override
  public void call(org.apache.spark.sql.api.java.Row row) throws Exception {
      LOG.info("Value is : "+row.getString(0));
  }
});
LOG.info(“让我们看看…”);
queryRdd.foreach(新的VoidFunction(){
@凌驾
公共void调用(org.apache.spark.sql.api.java.Row)引发异常{
LOG.info(“值为:”+row.getString(0));
}
});
必须有一个简单的方法。以下是stacktrace供参考:

2015-10-08 10:16:48 INFO  UpdateStatementTemplateImpl:141 - Lets see.. 
2015-10-08 10:16:48 WARN  GenericExceptionMapper:20 - Error while executing service
org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1476)
        at org.apache.spark.rdd.RDD.foreach(RDD.scala:781)
        at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:313)
        at org.apache.spark.sql.api.java.JavaSchemaRDD.foreach(JavaSchemaRDD.scala:42)
        at com.simility.cassandra.template.DeviceIDTemplateImpl.test(DeviceIDTemplateImpl.java:144)
        at com.kumbay.service.admin.BusinessEntityService.testSignal(BusinessEntityService.java:1801)
        at com.kumbay.service.admin.BusinessEntityService$$FastClassByCGLIB$$157ddd50.invoke(<generated>)
        at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
        at org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:260)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:634)
2015-10-08 10:16:48信息更新StatementTemplateImpl:141-让我们看看。。
2015-10-08 10:16:48警告GenericeExceptionMapper:20-执行服务时出错
org.apache.spark.SparkException:任务不可序列化
位于org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
位于org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
位于org.apache.spark.SparkContext.clean(SparkContext.scala:1476)
位于org.apache.spark.rdd.rdd.foreach(rdd.scala:781)
位于org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:313)
位于org.apache.spark.sql.api.java.JavaSchemaRDD.foreach(JavaSchemaRDD.scala:42)
位于com.simility.cassandra.template.DeviceIdtTemplateImpl.test(DeviceIdtTemplateImpl.java:144)
位于com.kumbay.service.admin.BusinessEntityService.testSignal(BusinessEntityService.java:1801)
在com.kumbay.service.admin.BusinessEntityService$$FastClassByCGLIB$$157ddd50.invoke()上
位于org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
位于org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:701)
在org.springframework.aop.framework.ReflectiveMethodInvocation.procedue(ReflectiveMethodInvocation.java:150)上
位于org.springframework.transaction.interceptor.TransactionInterceptor$1.proceedWithInvocation(TransactionInterceptor.java:96)
位于org.springframework.transaction.interceptor.TransactionSpectSupport.invokeWithinTransaction(TransactionSpectSupport.java:260)
位于org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:94)
位于org.springframework.aop.framework.ReflectiveMethodInvocation.procedue(ReflectiveMethodInvocation.java:172)
位于org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor.invoke(MethodSecurityInterceptor.java:64)
位于org.springframework.aop.framework.ReflectiveMethodInvocation.procedue(ReflectiveMethodInvocation.java:172)
位于org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:634)

我假设
LOG
myvalues
存在于一个包含类中。因此整个类(作为
调用的“捕获”的一部分)将被序列化,这是不可能的

解决方案 首先,用一个简单的
System.out.println
替换LOG,看看是否有效

第二,创建一份您在call中使用的成员的副本

public void call(...) {
    Log log = LOG // or
    ArrayList<String> inside = myvalues
    inside.add(...)
}
公共作废调用(…){
Log=Log//或
ArrayList-inside=myvalues
添加(…)
}
第三,永远不要在
foreach
中使用
ArrayList,因为它在不同的节点上运行,每个节点都会看到自己的ArrayList。因此,您永远不会得到预期的结果


相反,请使用
rdd.collect(…)
来收集结果!

您能否提供一个示例代码,说明在
rdd.collect(…)
method中作为参数传递的内容,以便将ArrayList作为输出?