Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 使用列值作为列名_Scala_Apache Spark - Fatal编程技术网

Scala 使用列值作为列名

Scala 使用列值作为列名,scala,apache-spark,Scala,Apache Spark,我需要一些关于数据帧的帮助 我在一列中有一个数学表单,我想将该表单应用于数据帧,并将结果保存在另一列中 DF: 我希望得到的结果是: +-------+---------------------+----------+----------+--------------+ |Id |form |double_1 |double_2 |double_result | +-------+---------------------+----------+---

我需要一些关于数据帧的帮助

我在一列中有一个数学表单,我想将该表单应用于数据帧,并将结果保存在另一列中

DF:

我希望得到的结果是:

+-------+---------------------+----------+----------+--------------+
|Id     |form                 |double_1  |double_2  |double_result |
+-------+---------------------+----------+----------+--------------+
|Math1  |double_1 + double_2  |12.0      |12.0      |24.4          |
|Math2  |double_2 - double_1  |1000.0    |10.0      |-990.0        |
|Math3  |double_1 + double_1  |12.02     |19.02     |24.04         |
+-------+---------------------+----------+----------+--------------+
但我不知道如何获取表单列值并解释为列名

我试过这个:

val resultDF= metricsDF.withColumn("double_result",expr(col("form").toString()))
输出为:

+-------+---------------------+----------+----------+---------------------+
|Id     |form                 |double_1  |double_2  |double_result        |
+-------+---------------------+----------+----------+---------------------+
|Math1  |double_1 + double_2  |12.0      |12.0      |double_1 + double_2  |
|Math2  |double_2 - double_1  |1000.0    |10.0      |double_2 - double_1  |
|Math3  |double_1 + double_1  |12.02     |19.02     |double_1 + double_1  |
+-------+---------------------+----------+----------+---------------------+
我该怎么做?我尝试了其他选择,但没有结果


谢谢大家!

我不认为Spark或Scala为使用
变量名解释数学表达式提供内置支持

但是您可以使用
java
library
ScriptEngineManager

//Sample data : 
val df = Seq(("Math1","double_1 + double_2",12.0,12.0),("Math2","double_2 - double_1",1000.0,10.0),("Math3","double_1 + double_1",12.02,19.02)
).toDF("Id","form","double_1","double_2")

import javax.script.SimpleBindings;
import javax.script.ScriptEngineManager
import java.util.Map
import java.util.HashMap


def calculateFunction = (mathExpression: String, double_1 : Double, double_2 : Double ) => {
    val vars: Map[String, Object] = new HashMap[String, Object]();
    vars.put("double_1",double_1.asInstanceOf[Object])
    vars.put("double_2",double_2.asInstanceOf[Object])
    val engine = new ScriptEngineManager().getEngineByExtension("js");
    val result = engine.eval(mathExpression, new SimpleBindings(vars));
    result.asInstanceOf[Double]
}

val calculateUDF = spark.udf.register("calculateFunction",calculateFunction)

val resultDF = df.withColumn("double_result",calculateUDF($"form",$"double_1",$"double_2"))

resultDF.show

+-----+-------------------+--------+--------+------+
|   Id|               form|double_1|double_2|  test|
+-----+-------------------+--------+--------+------+
|Math1|double_1 + double_2|    12.0|    12.0|  24.0|
|Math2|double_2 - double_1|  1000.0|    10.0|-990.0|
|Math3|double_1 + double_1|   12.02|   19.02| 24.04|
+-----+-------------------+--------+--------+------+

我建议您将数学表达式解析器替换为更原生的scala库(如果您找到的话)。但这是可行的。

我不认为Spark或Scala为使用
变量名解释数学表达式提供内置支持

但是您可以使用
java
library
ScriptEngineManager

//Sample data : 
val df = Seq(("Math1","double_1 + double_2",12.0,12.0),("Math2","double_2 - double_1",1000.0,10.0),("Math3","double_1 + double_1",12.02,19.02)
).toDF("Id","form","double_1","double_2")

import javax.script.SimpleBindings;
import javax.script.ScriptEngineManager
import java.util.Map
import java.util.HashMap


def calculateFunction = (mathExpression: String, double_1 : Double, double_2 : Double ) => {
    val vars: Map[String, Object] = new HashMap[String, Object]();
    vars.put("double_1",double_1.asInstanceOf[Object])
    vars.put("double_2",double_2.asInstanceOf[Object])
    val engine = new ScriptEngineManager().getEngineByExtension("js");
    val result = engine.eval(mathExpression, new SimpleBindings(vars));
    result.asInstanceOf[Double]
}

val calculateUDF = spark.udf.register("calculateFunction",calculateFunction)

val resultDF = df.withColumn("double_result",calculateUDF($"form",$"double_1",$"double_2"))

resultDF.show

+-----+-------------------+--------+--------+------+
|   Id|               form|double_1|double_2|  test|
+-----+-------------------+--------+--------+------+
|Math1|double_1 + double_2|    12.0|    12.0|  24.0|
|Math2|double_2 - double_1|  1000.0|    10.0|-990.0|
|Math3|double_1 + double_1|   12.02|   19.02| 24.04|
+-----+-------------------+--------+--------+------+

我建议您将数学表达式解析器替换为更原生的scala库(如果您找到的话)。但这是可行的。

您可以这样做,但前提是不同表单的数量不是很大:

// collect all distinct forms
val forms = df.select($"form").distinct().as[String].collect()
// build up expression
val columnExpression = forms.foldLeft(when(lit(false),null)){case (a,v) => a.when($"form"===v,expr(v))}

val resultDF = df
  .withColumn("double_result",columnExpression)

您可以这样做,但前提是不同表单的数量不是很大:

// collect all distinct forms
val forms = df.select($"form").distinct().as[String].collect()
// build up expression
val columnExpression = forms.foldLeft(when(lit(false),null)){case (a,v) => a.when($"form"===v,expr(v))}

val resultDF = df
  .withColumn("double_result",columnExpression)

这是一个相似的问题:这是一个相似的问题:你救了我一天!它工作得很好。我很抱歉我没有名声。但是,如果有人能给他+1。。。你救了我一天!它工作得很好。我很抱歉我没有名声。但是,如果有人能给他+1。。。谢谢你的帮助。我试图让它在我的项目上工作,但是“.as[String]”抛出了一个NullPointerException。我试图在foldLeft中转换为字符串,但得到的是genericExceptionSchema。@IvanR您使用的是哪种Spark版本?你导入了spark.implicits吗?嗨@Raphael,我正在使用spark 1.6,是的,我导入了它。我无法理解那个空指针。谢谢你的帮助。我试图让它在我的项目上工作,但是“.as[String]”抛出了一个NullPointerException。我试图在foldLeft中转换为字符串,但得到的是genericExceptionSchema。@IvanR您使用的是哪种Spark版本?你导入了spark.implicits吗?嗨@Raphael,我正在使用spark 1.6,是的,我导入了它。我无法理解那个空指针。