使用scala以其他列的长度作为值添加列
我的任务是计算每个列的长度,并将消息添加到“errorMsg”列。我可以根据长度筛选记录,但不能在新列中追加消息 例如。 我只想找出新列“ErrorMsg”上有消息的无效记录 记录长度=4使用scala以其他列的长度作为值添加列,scala,apache-spark,Scala,Apache Spark,我的任务是计算每个列的长度,并将消息添加到“errorMsg”列。我可以根据长度筛选记录,但不能在新列中追加消息 例如。 我只想找出新列“ErrorMsg”上有消息的无效记录 记录长度=4 InputDataFrame- +------+ | value| +------+ |Pra | |Akshay| | Raju| |Shakti| |xyz | +------+ 输出数据帧 +------+------------------------+ | va
InputDataFrame-
+------+
| value|
+------+
|Pra |
|Akshay|
| Raju|
|Shakti|
|xyz |
+------+
输出数据帧
+------+------------------------+
| value|ErrorMsg |
+------+------------------------+
|Pra |Less Than total Length
|Akshay|Greater than total length
|Shakti|Greater than total length
|xyx |Less than total length
+------+-------------------------
如果raju是我的真实记录,它将转到有效记录,而不显示消息。以下内容将获得所需的结果
val df = Seq("Pra", "Akshay", "Raju", "Shakti", "xyz").toDF("value")
df
.filter(not(length($"value") === 4))
.withColumn("ErrorMsg", when(length($"value") > lit(4), "Greater than total length").otherwise("Less Than total Length"))
.show(10000, false)
+------+-------------------------+
|value |ErrorMsg |
+------+-------------------------+
|Pra |Less Than total Length |
|Akshay|Greater than total length|
|Shakti|Greater than total length|
|xyz |Less Than total Length |
+------+-------------------------+