Scala 使用列值作为列名
我需要一些关于数据帧的帮助 我在一列中有一个数学表单,我想将该表单应用于数据帧,并将结果保存在另一列中 DF: 我希望得到的结果是:Scala 使用列值作为列名,scala,apache-spark,Scala,Apache Spark,我需要一些关于数据帧的帮助 我在一列中有一个数学表单,我想将该表单应用于数据帧,并将结果保存在另一列中 DF: 我希望得到的结果是: +-------+---------------------+----------+----------+--------------+ |Id |form |double_1 |double_2 |double_result | +-------+---------------------+----------+---
+-------+---------------------+----------+----------+--------------+
|Id |form |double_1 |double_2 |double_result |
+-------+---------------------+----------+----------+--------------+
|Math1 |double_1 + double_2 |12.0 |12.0 |24.4 |
|Math2 |double_2 - double_1 |1000.0 |10.0 |-990.0 |
|Math3 |double_1 + double_1 |12.02 |19.02 |24.04 |
+-------+---------------------+----------+----------+--------------+
但我不知道如何获取表单列值并解释为列名
我试过这个:
val resultDF= metricsDF.withColumn("double_result",expr(col("form").toString()))
输出为:
+-------+---------------------+----------+----------+---------------------+
|Id |form |double_1 |double_2 |double_result |
+-------+---------------------+----------+----------+---------------------+
|Math1 |double_1 + double_2 |12.0 |12.0 |double_1 + double_2 |
|Math2 |double_2 - double_1 |1000.0 |10.0 |double_2 - double_1 |
|Math3 |double_1 + double_1 |12.02 |19.02 |double_1 + double_1 |
+-------+---------------------+----------+----------+---------------------+
我该怎么做?我尝试了其他选择,但没有结果
谢谢大家! 我不认为Spark或Scala为使用
变量名解释数学表达式提供内置支持
但是您可以使用java
libraryScriptEngineManager
//Sample data :
val df = Seq(("Math1","double_1 + double_2",12.0,12.0),("Math2","double_2 - double_1",1000.0,10.0),("Math3","double_1 + double_1",12.02,19.02)
).toDF("Id","form","double_1","double_2")
import javax.script.SimpleBindings;
import javax.script.ScriptEngineManager
import java.util.Map
import java.util.HashMap
def calculateFunction = (mathExpression: String, double_1 : Double, double_2 : Double ) => {
val vars: Map[String, Object] = new HashMap[String, Object]();
vars.put("double_1",double_1.asInstanceOf[Object])
vars.put("double_2",double_2.asInstanceOf[Object])
val engine = new ScriptEngineManager().getEngineByExtension("js");
val result = engine.eval(mathExpression, new SimpleBindings(vars));
result.asInstanceOf[Double]
}
val calculateUDF = spark.udf.register("calculateFunction",calculateFunction)
val resultDF = df.withColumn("double_result",calculateUDF($"form",$"double_1",$"double_2"))
resultDF.show
+-----+-------------------+--------+--------+------+
| Id| form|double_1|double_2| test|
+-----+-------------------+--------+--------+------+
|Math1|double_1 + double_2| 12.0| 12.0| 24.0|
|Math2|double_2 - double_1| 1000.0| 10.0|-990.0|
|Math3|double_1 + double_1| 12.02| 19.02| 24.04|
+-----+-------------------+--------+--------+------+
我建议您将数学表达式解析器替换为更原生的scala库(如果您找到的话)。但这是可行的。我不认为Spark或Scala为使用
变量名解释数学表达式提供内置支持
但是您可以使用java
libraryScriptEngineManager
//Sample data :
val df = Seq(("Math1","double_1 + double_2",12.0,12.0),("Math2","double_2 - double_1",1000.0,10.0),("Math3","double_1 + double_1",12.02,19.02)
).toDF("Id","form","double_1","double_2")
import javax.script.SimpleBindings;
import javax.script.ScriptEngineManager
import java.util.Map
import java.util.HashMap
def calculateFunction = (mathExpression: String, double_1 : Double, double_2 : Double ) => {
val vars: Map[String, Object] = new HashMap[String, Object]();
vars.put("double_1",double_1.asInstanceOf[Object])
vars.put("double_2",double_2.asInstanceOf[Object])
val engine = new ScriptEngineManager().getEngineByExtension("js");
val result = engine.eval(mathExpression, new SimpleBindings(vars));
result.asInstanceOf[Double]
}
val calculateUDF = spark.udf.register("calculateFunction",calculateFunction)
val resultDF = df.withColumn("double_result",calculateUDF($"form",$"double_1",$"double_2"))
resultDF.show
+-----+-------------------+--------+--------+------+
| Id| form|double_1|double_2| test|
+-----+-------------------+--------+--------+------+
|Math1|double_1 + double_2| 12.0| 12.0| 24.0|
|Math2|double_2 - double_1| 1000.0| 10.0|-990.0|
|Math3|double_1 + double_1| 12.02| 19.02| 24.04|
+-----+-------------------+--------+--------+------+
我建议您将数学表达式解析器替换为更原生的scala库(如果您找到的话)。但这是可行的。您可以这样做,但前提是不同表单的数量不是很大:
// collect all distinct forms
val forms = df.select($"form").distinct().as[String].collect()
// build up expression
val columnExpression = forms.foldLeft(when(lit(false),null)){case (a,v) => a.when($"form"===v,expr(v))}
val resultDF = df
.withColumn("double_result",columnExpression)
您可以这样做,但前提是不同表单的数量不是很大:
// collect all distinct forms
val forms = df.select($"form").distinct().as[String].collect()
// build up expression
val columnExpression = forms.foldLeft(when(lit(false),null)){case (a,v) => a.when($"form"===v,expr(v))}
val resultDF = df
.withColumn("double_result",columnExpression)
这是一个相似的问题:这是一个相似的问题:你救了我一天!它工作得很好。我很抱歉我没有名声。但是,如果有人能给他+1。。。你救了我一天!它工作得很好。我很抱歉我没有名声。但是,如果有人能给他+1。。。谢谢你的帮助。我试图让它在我的项目上工作,但是“.as[String]”抛出了一个NullPointerException。我试图在foldLeft中转换为字符串,但得到的是genericExceptionSchema。@IvanR您使用的是哪种Spark版本?你导入了spark.implicits吗?嗨@Raphael,我正在使用spark 1.6,是的,我导入了它。我无法理解那个空指针。谢谢你的帮助。我试图让它在我的项目上工作,但是“.as[String]”抛出了一个NullPointerException。我试图在foldLeft中转换为字符串,但得到的是genericExceptionSchema。@IvanR您使用的是哪种Spark版本?你导入了spark.implicits吗?嗨@Raphael,我正在使用spark 1.6,是的,我导入了它。我无法理解那个空指针。