Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Scala Spark:ColumnType内数组中值的索引_Scala_Apache Spark_Apache Spark Sql - Fatal编程技术网

Scala Spark:ColumnType内数组中值的索引

Scala Spark:ColumnType内数组中值的索引,scala,apache-spark,apache-spark-sql,Scala,Apache Spark,Apache Spark Sql,在Spark中,我使用Scala创建了一个数据集,其中一列具有类似于 [foo, bar, baz, bei] [foo, bar, baz, bei] [foo, zie] 现在我想添加另一列,其索引值为“bar” 有类似的吗 .withColumn("idx", array_contains(col("Name"),"bar")) 哪个正在返回真/假(我需要值的索引)?带自定义项: val df = List( Seq("foo", "bar", "baz", "bei"), S

在Spark中,我使用Scala创建了一个数据集,其中一列具有类似于

[foo, bar, baz, bei]
[foo, bar, baz, bei]
[foo, zie]
现在我想添加另一列,其索引值为“bar”

有类似的吗

.withColumn("idx", array_contains(col("Name"),"bar"))
哪个正在返回真/假(我需要值的索引)?

带自定义项:

val df = List(
  Seq("foo", "bar", "baz", "bei"),
  Seq("foo", "bar", "baz", "bei"),
  Seq("foo", "zie")
).toDF()

val getIndex = (seq: Seq[String]) => seq.indexOf("bar") + 1
val getIndexUDF = udf(getIndex)

val result = df.withColumn("idx", getIndexUDF($"value"))
result.show(false)
输出:

+--------------------+---+
|value               |idx|
+--------------------+---+
|[foo, bar, baz, bei]|2  |
|[foo, bar, baz, bei]|2  |
|[foo, zie]          |0  |
+--------------------+---+

由于版本2.4.0,Spark提供了
阵列位置
功能

import org.apache.spark.sql.functions.array_position

df.withColumn("idx", array_position($"Name", "bar"))
import org.apache.spark.sql.functions.array_position

df.withColumn("idx", array_position($"Name", "bar"))