Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby-on-rails-4/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何从spark中的现有列创建列_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala 如何从spark中的现有列创建列

Scala 如何从spark中的现有列创建列,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,我试图通过从现有数据集计算列来向数据集添加列: val test=Seq("aaxxx","bbxxx","ccxxx").toDF test.show : +-----+ |value| +-----+ |aaxxx| |bbxxx| |ccxxx| +-----+ 以下是我想要的: +-----+----+ |value|val2| +-----+----+ |aaxxx|aa | |bbxxx|bb | |ccxxx|cc | +-----+----+ 为此,我尝试: val

我试图通过从现有数据集计算列来向数据集添加列:

val test=Seq("aaxxx","bbxxx","ccxxx").toDF
test.show : 
+-----+
|value|
+-----+
|aaxxx|
|bbxxx|
|ccxxx|
+-----+
以下是我想要的:

+-----+----+
|value|val2|
+-----+----+
|aaxxx|aa  |
|bbxxx|bb  |
|ccxxx|cc  |
+-----+----+
为此,我尝试:

val column =test.select("value").as[String].map(e=>e.substring(0,2)).col("value")
test.withColumn("value2", column)
但我得到了:

org.apache.spark.sql.AnalysisException:运算符中的值#1缺少已解析属性值#10!项目[价值1,价值10作为价值2.#17]。具有相同名称的属性出现在操作:值中。请检查是否使用了正确的属性。;; !项目[价值1,价值10作为价值2.#17] +-LocalRelation[值#1]

有人能看到我的代码有什么问题吗,或者有更好的方法获得所需的结果吗

使用带子字符串的withcolumn

希望对你有帮助

使用带子字符串的withcolumn


希望它能帮助您

您可以使用spark sql函数子字符串来获取前两个字符,如下所示:

import org.apache.spark.sql.function.{substring, col}
val newDf = test.withColumn("val2", substring(col("value"), 0, 2))

可以使用spark sql函数子字符串获取前两个字符,如下所示:

import org.apache.spark.sql.function.{substring, col}
val newDf = test.withColumn("val2", substring(col("value"), 0, 2))

以下是三种方法:

1) 使用必须导入的函数
子字符串

import org.apache.spark.sql.functions.{substring}

test.withColumn("value2", substring($"value", 0, 2))
2) 对列对象调用方法
substr

test.withColumn("value2", $"value".substr(0, 2))
3) 使用SQL表达式:

test.selectExpr("value", "substring(value, 0, 2) AS value2")

以下是三种方法:

1) 使用必须导入的函数
子字符串

import org.apache.spark.sql.functions.{substring}

test.withColumn("value2", substring($"value", 0, 2))
2) 对列对象调用方法
substr

test.withColumn("value2", $"value".substr(0, 2))
3) 使用SQL表达式:

test.selectExpr("value", "substring(value, 0, 2) AS value2")