Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/linux/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 从Java/Scala应用程序执行PySpark代码_Apache Spark_Pyspark_Pyspark Dataframes - Fatal编程技术网

Apache spark 从Java/Scala应用程序执行PySpark代码

Apache spark 从Java/Scala应用程序执行PySpark代码,apache-spark,pyspark,pyspark-dataframes,Apache Spark,Pyspark,Pyspark Dataframes,有没有办法在现有SparkSession上从Java/Scala应用程序执行PySpark代码 具体来说,给定一个接收并返回PySpark数据帧的PySpark代码,是否有办法将其提交给Java/Scala SparkSession并返回输出数据帧: String pySparkCode = "def my_func(input_df):\n" + " from pyspark.sql.functions import *\n" + " return input_df

有没有办法在现有SparkSession上从Java/Scala应用程序执行PySpark代码

具体来说,给定一个接收并返回PySpark数据帧的PySpark代码,是否有办法将其提交给Java/Scala SparkSession并返回输出数据帧:

String pySparkCode = "def my_func(input_df):\n" +
    "    from pyspark.sql.functions import *\n" +
    "    return input_df.selectExpr(...)\n" +
    "            .drop(...)\n" +
    "            .withColumn(...)\n"

SparkSession spark = SparkSession.builder().master("local").getOrCreate()

Dataset inputDF = spark.sql("SELECT * from my_table")

outputDf = spark.<SUBMIT_PYSPARK_METHOD>(pySparkCode, inputDF)

String pySparkCode=“def my\u func(input\u df):\n”+
“从pyspark.sql.functions导入*\n”+
“返回输入值。选择表达式(…)\n”+
“.drop(…)\n”+
“.withColumn(…)\n”
SparkSession spark=SparkSession.builder().master(“本地”).getOrCreate()
数据集inputDF=spark.sql(“从我的表中选择*)
outputDf=火花。(pySparkCode,inputDF)