Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 如何从spark中的字符串列中提取数字部分,并在经过数学运算后更新相同的列值_Scala_Dataframe_Apache Spark Sql - Fatal编程技术网

Scala 如何从spark中的字符串列中提取数字部分,并在经过数学运算后更新相同的列值

Scala 如何从spark中的字符串列中提取数字部分,并在经过数学运算后更新相同的列值,scala,dataframe,apache-spark-sql,Scala,Dataframe,Apache Spark Sql,我是scala spark的新手,尝试在dataframe列下面执行操作 我有一列包含字母数字值,希望根据数学运算更新这些值 +--------------------------------------+ |Error | +--------------------------------------+ |value: 0.25 Does not meet Requirements| |va

我是scala spark的新手,尝试在dataframe列下面执行操作 我有一列包含字母数字值,希望根据数学运算更新这些值

    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.25 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.75 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    +--------------------------------------+
我想执行数值操作(1-{numeric values from String})并用新值更新列

例如,我希望输出如下所示

    +--------------------------------------+
    |Error                                 |
    +--------------------------------------+
    |value: 0.75 Does not meet Requirements|
    |value: 0.5  Does not meet Requirements|
    |value: 0.25 Does not meet Requirements|
    |value: 0.34 Does not meet Requirements|
    |value: 0.66 Does not meet Requirements|
    +--------------------------------------+
任何帮助都将不胜感激,我学习了使用正则表达式的列方法,但要执行数学运算,我没有得到任何线索

问候
Mahi

假设您有多个列:

+------+--------------------+
|  col1|               Error|
+------+--------------------+
| first|value: 0.25 Does ...|
|second|value: 0.5  Does ...|
| third|value: 0.75 Does ...|
|fourth|value: 0.66 Does ...|
| fifth|value: 0.34 Does ...|
+------+--------------------+
您可以使用
split
mkString
更新列
Error

val subtractFromOne: Double => String = number =>
  (BigDecimal(1.0) - BigDecimal(number)).toString()

val transform: String => String = s => s.split(' ') match {
  case Array(first, number, rest@_*) =>
    (Seq(first, subtractFromOne(number.toDouble)) ++ rest).mkString(" ")
  case _ => s // in case if the string is invalid we can return it unchanged
}

implicit val enc: Encoder[Row] = RowEncoder(df.schema)

df
  .map(row => Row(row(0), transform(row.getString(1))))
  .show()
将输出:

+------+--------------------------------------+
|  col1|                                 Error|
+------+--------------------------------------+
| first|value: 0.75 Does not meet Requirements|
|second|value: 0.5  Does not meet Requirements|
| third|value: 0.25 Does not meet Requirements|
|fourth|value: 0.34 Does not meet Requirements|
| fifth|value: 0.66 Does not meet Requirements|
+------+--------------------------------------+

BigDecimal
用于保持磅秤的高度,谢谢您的回复。只是想检查如何在第二段代码中传递col name,假设error是我的dataframe的第三列,我希望其他列也与error列更新值一起出现。