Scala 如何从spark中的字符串列中提取数字部分,并在经过数学运算后更新相同的列值
我是scala spark的新手,尝试在dataframe列下面执行操作 我有一列包含字母数字值,希望根据数学运算更新这些值Scala 如何从spark中的字符串列中提取数字部分,并在经过数学运算后更新相同的列值,scala,dataframe,apache-spark-sql,Scala,Dataframe,Apache Spark Sql,我是scala spark的新手,尝试在dataframe列下面执行操作 我有一列包含字母数字值,希望根据数学运算更新这些值 +--------------------------------------+ |Error | +--------------------------------------+ |value: 0.25 Does not meet Requirements| |va
+--------------------------------------+
|Error |
+--------------------------------------+
|value: 0.25 Does not meet Requirements|
|value: 0.5 Does not meet Requirements|
|value: 0.75 Does not meet Requirements|
|value: 0.66 Does not meet Requirements|
|value: 0.34 Does not meet Requirements|
+--------------------------------------+
我想执行数值操作(1-{numeric values from String})并用新值更新列
例如,我希望输出如下所示
+--------------------------------------+
|Error |
+--------------------------------------+
|value: 0.75 Does not meet Requirements|
|value: 0.5 Does not meet Requirements|
|value: 0.25 Does not meet Requirements|
|value: 0.34 Does not meet Requirements|
|value: 0.66 Does not meet Requirements|
+--------------------------------------+
任何帮助都将不胜感激,我学习了使用正则表达式的列方法,但要执行数学运算,我没有得到任何线索
问候
Mahi假设您有多个列:
+------+--------------------+
| col1| Error|
+------+--------------------+
| first|value: 0.25 Does ...|
|second|value: 0.5 Does ...|
| third|value: 0.75 Does ...|
|fourth|value: 0.66 Does ...|
| fifth|value: 0.34 Does ...|
+------+--------------------+
您可以使用split
和mkString
更新列Error
val subtractFromOne: Double => String = number =>
(BigDecimal(1.0) - BigDecimal(number)).toString()
val transform: String => String = s => s.split(' ') match {
case Array(first, number, rest@_*) =>
(Seq(first, subtractFromOne(number.toDouble)) ++ rest).mkString(" ")
case _ => s // in case if the string is invalid we can return it unchanged
}
implicit val enc: Encoder[Row] = RowEncoder(df.schema)
df
.map(row => Row(row(0), transform(row.getString(1))))
.show()
将输出:
+------+--------------------------------------+
| col1| Error|
+------+--------------------------------------+
| first|value: 0.75 Does not meet Requirements|
|second|value: 0.5 Does not meet Requirements|
| third|value: 0.25 Does not meet Requirements|
|fourth|value: 0.34 Does not meet Requirements|
| fifth|value: 0.66 Does not meet Requirements|
+------+--------------------------------------+
BigDecimal
用于保持磅秤的高度,谢谢您的回复。只是想检查如何在第二段代码中传递col name,假设error是我的dataframe的第三列,我希望其他列也与error列更新值一起出现。