Scala 如何在spark中使用Regexp_replace_Scala_Apache Spark_Apache Spark Sql_Regexp Replace

Scala 如何在spark中使用Regexp_replace

scala apache-spark

Scala 如何在spark中使用Regexp_replace,scala,apache-spark,apache-spark-sql,regexp-replace,Scala,Apache Spark,Apache Spark Sql,Regexp Replace,我是spark的新手，希望对数据帧的一列执行一个操作，以便将该列中的所有，替换为假设存在数据帧x和列x4 x4 1,3435 1,6566 -0,34435 我希望输出为 x4 1.3435 1.6566 -0.34435 我使用的代码是 import org.apache.spark.sql.Column def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4) 但是我得到了以下错误 import org.

我是spark的新手，希望对数据帧的一列执行一个操作，以便将该列中的所有

，

替换为

假设存在数据帧x和列x4

我希望输出为

我使用的代码是

import org.apache.spark.sql.Column
def replace = regexp_replace((x.x4,1,6566:String,1.6566:String)x.x4)

但是我得到了以下错误

import org.apache.spark.sql.Column
<console>:1: error: ')' expected but '.' found.
       def replace = regexp_replace((train_df.x37,0,160430299:String,0.160430299:String)train_df.x37)

import org.apache.spark.sql.Column
：1:错误：'）应为“”，但找到“”。
def replace=regexp_replace（（序列测向x37,0160430299:字符串，0.160430299:字符串）序列测向x37）

如果语法、逻辑或任何其他合适的方式有任何帮助，我们将不胜感激

假设

x4

是一个字符串列，这里有一个可复制的示例

import org.apache.spark.sql.functions.regexp_replace

val df = spark.createDataFrame(Seq(
  (1, "1,3435"),
  (2, "1,6566"),
  (3, "-0,34435"))).toDF("Id", "x4")

语法为

regexp\u replace（str，pattern，replacement）

，翻译为：

df.withColumn("x4New", regexp_replace(df("x4"), "\\,", ".")).show
+---+--------+--------+
| Id|      x4|   x4New|
+---+--------+--------+
|  1|  1,3435|  1.3435|
|  2|  1,6566|  1.6566|
|  3|-0,34435|-0.34435|
+---+--------+--------+

我们可以使用

map

方法进行此转换：

scala> df.map(each => { 
(each.getInt(0),each.getString(1).replace(",", "."))
})
.toDF("Id","x4")
.show

Output:

+---+--------+
| Id|      x4|
+---+--------+
|  1|  1.3435|
|  2|  1.6566|
|  3|-0.34435|
+---+--------+

我可以用多个字符代替逗号吗？对于exmaple，我想用任何其他字符替换逗号点感叹号？要用一个字符替换多个特殊字符吗？是的，这是可能的。我试过了，但没有成功。你能告诉我怎么做吗？你可以试试类似的

regexp\u替换（df.col，“[\\？，\\，\\$]”，“）