Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache spark 如何在hive和spark中解决此查询?_Apache Spark_Hive - Fatal编程技术网

Apache spark 如何在hive和spark中解决此查询?

Apache spark 如何在hive和spark中解决此查询?,apache-spark,hive,Apache Spark,Hive,编写一个hivesql并显示如下输出 id name dob ------------------------- 1 anjan 10-16-1989 输出: id name dob ------------------------- 1 a 10-16-1989 1 n 10-16-1989 1 j 10-16-1

编写一个hivesql并显示如下输出

id     name            dob
-------------------------
1  anjan   10-16-1989
输出:

id     name            dob
-------------------------
1       a              10-16-1989
1       n              10-16-1989
1       j              10-16-1989
1       a              10-16-1989
1       n              10-16-1989

上述场景在spark中求解并显示与上述输出相同的数据帧(命名为data),假设您有一个来自Hive的数据帧,如下所示:

+---+-----+----------+
| id| name|       dob|
+---+-----+----------+
|  1|anjan|10-16-1989|
+---+-----+----------+
您可以在spark中定义用户定义的函数,用于将字符串转换为数组:

val toArray = udf((name: String) => name.toArray.map(_.toString))
因此,我们可以将其应用于“名称”列:

val df = data.withColumn("name", toArray(res0("name")))

+---+---------------+----------+
| id|           name|       dob|
+---+---------------+----------+
|  1|[a, n, j, a, n]|10-16-1989|
+---+---------------+----------+
我们现在可以在name列上使用explode函数

df.withColumn("name", explode(df("name")))

+---+----+----------+
| id|name|       dob|
+---+----+----------+
|  1|   a|10-16-1989|
|  1|   n|10-16-1989|
|  1|   j|10-16-1989|
|  1|   a|10-16-1989|
|  1|   n|10-16-1989|
+---+----+----------+