Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 使用spark launcher时将参数传递到jar_Apache Spark_Jar_Spark Launcher - Fatal编程技术网

Apache spark 使用spark launcher时将参数传递到jar

Apache spark 使用spark launcher时将参数传递到jar,apache-spark,jar,spark-launcher,Apache Spark,Jar,Spark Launcher,我试图创建一个可执行的jar,它使用运行另一个带有数据转换任务的jar(这个jar创建spark会话) 我需要将java参数(一些java数组)传递给由启动器执行的jar object launcher { @throws[Exception] // How do I pass parameters to spark_job_with_spark_session.jar def main(args: Array[String]): Unit = { val handle =

我试图创建一个可执行的jar,它使用运行另一个带有数据转换任务的jar(这个jar创建spark会话)

我需要将java参数(一些java数组)传递给由启动器执行的jar

object launcher {
  @throws[Exception]
  // How do I pass parameters to spark_job_with_spark_session.jar
  def main(args: Array[String]): Unit = {
    val handle = new SparkLauncher()
      .setAppResource("spark_job_with_spark_session.jar")
      .setVerbose(true)
      .setMaster("local[*]")
      .setConf(SparkLauncher.DRIVER_MEMORY, "4g")
      .launch()
  }
}
我该怎么做

需要传递java参数(某些java数组)

它相当于执行
spark submit
,因此不能直接传递Java对象。使用

传递应用程序参数,并在应用程序中解析它们

/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package com.meow.woof.meow_spark_launcher.app;

import com.meow.woof.meow_spark_launcher.common.TaskListener;
import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;

/**
 *
 * @author hahattpro
 */
public class ExampleSparkLauncherApp {

    public static void main(String[] args) throws Exception {
        SparkAppHandle handle = new SparkLauncher()
                .setAppResource("/home/cpu11453/workplace/experiment/SparkPlayground/target/scala-2.11/SparkPlayground-assembly-0.1.jar")
                .setMainClass("me.thaithien.playground.ConvertToCsv")
                .setMaster("spark://cpu11453:7077")
                .setConf(SparkLauncher.DRIVER_MEMORY, "3G")
                .addAppArgs("--input" , "/data/download_hdfs/data1/2019_08_13/00/", "--output", "/data/download_hdfs/data1/2019_08_13/00_csv_output/")
                .startApplication(new TaskListener());

        handle.addListener(new SparkAppHandle.Listener() {
            @Override
            public void stateChanged(SparkAppHandle handle) {
                System.out.println(handle.getState() + " new  state");
            }

            @Override
            public void infoChanged(SparkAppHandle handle) {
                System.out.println(handle.getState() + " new  state");
            }
        });

        System.out.println(handle.getState().toString());

        while (!handle.getState().isFinal()) {
            //await until job finishes
            Thread.sleep(1000L);
        }
    }
}

下面是有效的示例代码

谢谢您的回答!该死我需要从应用程序的内存中传递大的java数组。因此它应该很快,所以我不认为将数组强制转换为字符串会起作用。。。对于如何实现这一点,您可能还有其他建议吗?可能将写入文件序列化,然后将其读回,并只传递一个路径作为参数?嗯,写入文件也将是非常昂贵的操作:(我怀疑您在这里有很多选择。这是一个单独的JVM,没有共享内存,所以我认为只有优化才可能是可行的)快速序列化b)快速文件系统(内存中的fs)。除非您想通过地址直接访问内存;)@hi zir-感谢您的回答-这在我自己的情况下帮助了我,我需要将参数传递给驱动程序应用程序(投票+1)。在我的例子中,我使用apachecommoncli库来解析我的命令行参数。工作得很有魅力。谢谢
/*
 * To change this license header, choose License Headers in Project Properties.
 * To change this template file, choose Tools | Templates
 * and open the template in the editor.
 */
package com.meow.woof.meow_spark_launcher.app;

import com.meow.woof.meow_spark_launcher.common.TaskListener;
import org.apache.spark.launcher.SparkAppHandle;
import org.apache.spark.launcher.SparkLauncher;

/**
 *
 * @author hahattpro
 */
public class ExampleSparkLauncherApp {

    public static void main(String[] args) throws Exception {
        SparkAppHandle handle = new SparkLauncher()
                .setAppResource("/home/cpu11453/workplace/experiment/SparkPlayground/target/scala-2.11/SparkPlayground-assembly-0.1.jar")
                .setMainClass("me.thaithien.playground.ConvertToCsv")
                .setMaster("spark://cpu11453:7077")
                .setConf(SparkLauncher.DRIVER_MEMORY, "3G")
                .addAppArgs("--input" , "/data/download_hdfs/data1/2019_08_13/00/", "--output", "/data/download_hdfs/data1/2019_08_13/00_csv_output/")
                .startApplication(new TaskListener());

        handle.addListener(new SparkAppHandle.Listener() {
            @Override
            public void stateChanged(SparkAppHandle handle) {
                System.out.println(handle.getState() + " new  state");
            }

            @Override
            public void infoChanged(SparkAppHandle handle) {
                System.out.println(handle.getState() + " new  state");
            }
        });

        System.out.println(handle.getState().toString());

        while (!handle.getState().isFinal()) {
            //await until job finishes
            Thread.sleep(1000L);
        }
    }
}