Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/scala/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
spark scala中结构数组时如何更新列值_Scala_Apache Spark - Fatal编程技术网

spark scala中结构数组时如何更新列值

spark scala中结构数组时如何更新列值,scala,apache-spark,Scala,Apache Spark,我只想知道,如果我有一列不想更新的列,是否可以将struct数组更新为某个值。 例如 如果我有一个列表[字符串]=列表(斑马,狗) 是否可以将列的所有其他数组设置为0,例如大象和狮子将为0 root |-- _id: string (nullable = true) |-- h: string (nullable = true) |-- inc: string (nullable = true) |-- op: string (nullable = true) |-- ts: stri

我只想知道,如果我有一列不想更新的列,是否可以将struct数组更新为某个值。 例如 如果我有一个列表[字符串]=列表(斑马,狗) 是否可以将列的所有其他数组设置为0,例如大象和狮子将为0

root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- Animal: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- Elephant: string (nullable = false)
 |    |    |-- Lion: string (nullable = true)
 |    |    |-- Zebra: string (nullable = true)
 |    |    |-- Dog: string (nullable = true)
我是一行一行地迭代 就像我做了一个函数一样

+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[1, 1, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 1, 1, 0]]|
+---+----+-----+------+-------+--------------------+
After operations It will be
+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[0, 0, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 0, 1, 0]]|
+---+----+-----+------+-------+--------------------+
但无法执行此操作

请检查下面的代码

def changeValue(row :Row) = {
//some code
}
构造表达式

scala> ddf.show(false)
+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[1, 11, 111, 1111]]|
|fb1|fb11|fb111|fb1111|fb11111|[[2, 22, 222, 2222]]|
+---+----+-----+------+-------+--------------------+


scala> val columnsTobeUpdatedInWebhooks = Seq("zebra","dog") // Columns to be updated in webhooks.
columnsTobeUpdatedInWebhooks: Seq[String] = List(zebra, dog)
val expr = flatten(
    array(
        ddf
        .select(explode($"webhooks").as("webhooks"))
        .select("webhooks.*")
        .columns
        .map(c => if(columnsTobeUpdatedInWebhooks.contains(c)) col(s"webhooks.${c}").as(c) else array(lit(0)).as(c)):_*
    )
)

expr: org.apache.spark.sql.Column = flatten(array(array(0) AS `elephant`, array(0) AS `lion`, webhooks.zebra AS `zebra`, webhooks.dog AS `dog`))

应用表达式

scala> ddf.show(false)
+---+----+-----+------+-------+--------------------+
|_id|h   |inc  |op    |ts     |webhooks            |
+---+----+-----+------+-------+--------------------+
|fa1|fa11|fa111|fa1111|fa11111|[[1, 11, 111, 1111]]|
|fb1|fb11|fb111|fb1111|fb11111|[[2, 22, 222, 2222]]|
+---+----+-----+------+-------+--------------------+


scala> val columnsTobeUpdatedInWebhooks = Seq("zebra","dog") // Columns to be updated in webhooks.
columnsTobeUpdatedInWebhooks: Seq[String] = List(zebra, dog)
val expr = flatten(
    array(
        ddf
        .select(explode($"webhooks").as("webhooks"))
        .select("webhooks.*")
        .columns
        .map(c => if(columnsTobeUpdatedInWebhooks.contains(c)) col(s"webhooks.${c}").as(c) else array(lit(0)).as(c)):_*
    )
)

expr: org.apache.spark.sql.Column = flatten(array(array(0) AS `elephant`, array(0) AS `lion`, webhooks.zebra AS `zebra`, webhooks.dog AS `dog`))

最终模式

scala> ddf.withColumn("webhooks",struct(expr)).show(false)
+---+----+-----+------+-------+--------------+
|_id|h   |inc  |op    |ts     |webhooks      |
+---+----+-----+------+-------+--------------+
|fa1|fa11|fa111|fa1111|fa11111|[[0, 0, 0, 1]]|
|fb1|fb11|fb111|fb1111|fb11111|[[0, 0, 1, 0]]|
+---+----+-----+------+-------+--------------+


下面的解决方案是否不起作用??