Apache spark 如何在spark中使用foreach访问阵列?

Apache spark 如何在spark中使用foreach访问阵列?,apache-spark,apache-spark-sql,Apache Spark,Apache Spark Sql,我有如下数据: tab1,c1|c2|c3 tab2,d1|d2|d3|d4|d5 tab3,e1|e2|e3|e4 我需要在spark中将其转换为如下所示: select c1,c2,c3 from tab1; select d1,d2,d3,d4,d5 from tab2; select e1,e2,e3,e4 from tab3; 我能做到这样: d.foreach(f=>{println("select"+" "+f+" from"+";")}) select tab3,e1,

我有如下数据:

tab1,c1|c2|c3
tab2,d1|d2|d3|d4|d5
tab3,e1|e2|e3|e4
我需要在spark中将其转换为如下所示:

select c1,c2,c3 from tab1;
select d1,d2,d3,d4,d5 from tab2;
select e1,e2,e3,e4 from tab3;
我能做到这样:

d.foreach(f=>{println("select"+" "+f+" from"+";")})
select tab3,e1,e2,e3,e4 from;
select tab1,c1,c2,c3 from;
select tab2,d1,d2,d3,d4,d5 from;

有人能提出建议吗?

我不认为spark适合你的问题。变量“d”代表什么

下面是我对一些可能有用的东西的猜测

from pyspark.sql.types import *
from pyspark.sql.functions import *

mySchema = StructType([
  StructField("table_name", StringType()),
  StructField("column_name", 
    ArrayType(StringType())
  )
])

df = spark.createDataFrame([
                            ("tab1",["c1","c2","c3"]),
                            ("tab2",["d1","d2","d3","d4","d5"]),
                            ("tab3",["e1","e2","e3","e4"])
  ],
  schema = mySchema
)

df.selectExpr('concat("select ", concat_ws(",", column_name), " from ", table_name, ";") as select_string').show(3, False)
输出:

+--------------------------------+
|select_string                   |
+--------------------------------+
|select c1,c2,c3 from tab1;      |
|select d1,d2,d3,d4,d5 from tab2;|
|select e1,e2,e3,e4 from tab3;   |
+--------------------------------+
您还可以在RDD上使用映射操作

假设您有一个类似以下字符串的RDD:

通过此操作:

val select = rdd.map(str=> {
      val separated = str.split(",", -1)
      val table = separated(0)
      val cols = separated(1).split("\\|", -1).mkString(",")

      "select " + cols + " from " + table + ";"
    })
您将获得预期的结果:

select.foreach(println(_))
select d1,d2,d3,d4,d5 from tab2;
select e1,e2,e3,e4 from tab3;
select c1,c2,c3 from tab1;

这是一种选择。我实际上是想通过RDD创建。嘿,pheeleeppoo,非常感谢你的代码。我在做这件事时被绊住了。我用另一种方式做的。不管怎样,谢谢你的时间。这对其他人分享你所做的事情和/或接受一件对你有帮助的衣服都很有用。你的回答真的很有帮助!谢谢
select.foreach(println(_))
select d1,d2,d3,d4,d5 from tab2;
select e1,e2,e3,e4 from tab3;
select c1,c2,c3 from tab1;