Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala 在spark dataframe中将字符串拆分为两列_Scala_Apache Spark_Split - Fatal编程技术网

Scala 在spark dataframe中将字符串拆分为两列

Scala 在spark dataframe中将字符串拆分为两列,scala,apache-spark,split,Scala,Apache Spark,Split,我有一个数据帧,行值为“My name is Rahul”,我想在一列中拆分“My name is”,在另一列中拆分“Rahul”。这里没有使用拆分函数的分隔符。如何在spark中执行此操作?在spark中使用regexp\u extract函数,而不是Split函数 正则表达式解释: (.*)\\s(.*) //capture everything into 1 capture group until last space(\s) then capture everything after i

我有一个数据帧,行值为“My name is Rahul”,我想在一列中拆分“My name is”,在另一列中拆分“Rahul”。这里没有使用拆分函数的分隔符。如何在spark中执行此操作?

在spark中使用
regexp\u extract
函数,而不是
Split
函数

正则表达式解释:

(.*)\\s(.*) //capture everything into 1 capture group until last space(\s) then capture everything after into 2 capture group.
val df= Seq(("My name is Rahul")).toDF("text") //sample string

df.withColumn("col1",regexp_extract($"text","(.*)\\s(.*)",1)).
withColumn("col2",regexp_extract($"text","(.*)\\s(.*)",2)).
show()
+----------------+----------+-----+
|            text|      col1| col2|
+----------------+----------+-----+
|My name is Rahul|My name is|Rahul|
+----------------+----------+-----+
示例:

(.*)\\s(.*) //capture everything into 1 capture group until last space(\s) then capture everything after into 2 capture group.
val df= Seq(("My name is Rahul")).toDF("text") //sample string

df.withColumn("col1",regexp_extract($"text","(.*)\\s(.*)",1)).
withColumn("col2",regexp_extract($"text","(.*)\\s(.*)",2)).
show()
+----------------+----------+-----+
|            text|      col1| col2|
+----------------+----------+-----+
|My name is Rahul|My name is|Rahul|
+----------------+----------+-----+
结果:

(.*)\\s(.*) //capture everything into 1 capture group until last space(\s) then capture everything after into 2 capture group.
val df= Seq(("My name is Rahul")).toDF("text") //sample string

df.withColumn("col1",regexp_extract($"text","(.*)\\s(.*)",1)).
withColumn("col2",regexp_extract($"text","(.*)\\s(.*)",2)).
show()
+----------------+----------+-----+
|            text|      col1| col2|
+----------------+----------+-----+
|My name is Rahul|My name is|Rahul|
+----------------+----------+-----+