Apache spark 将数据加载到配置单元时,在字段中添加周围的引号
我有如下数据:Apache spark 将数据加载到配置单元时,在字段中添加周围的引号,apache-spark,hive,hdfs,Apache Spark,Hive,Hdfs,我有如下数据: 1,Anna,London 2,Peter,Amsterdam "1" "Anna" "London" "2" "Peter" "Amsterdam" 我想将此数据作为数据框加载到配置单元中,并添加周围的引号,以便数据框中的数据如下所示: 1,Anna,London 2,Peter,Amsterdam "1" "Anna" "London" "2" "Peter" "Amsterdam" 我已将分隔符设置为“,”。我知道有quote函数,但它的作用正好相反。如何添加引号?
1,Anna,London
2,Peter,Amsterdam
"1" "Anna" "London"
"2" "Peter" "Amsterdam"
我想将此数据作为数据框加载到配置单元中,并添加周围的引号,以便数据框中的数据如下所示:
1,Anna,London
2,Peter,Amsterdam
"1" "Anna" "London"
"2" "Peter" "Amsterdam"
我已将分隔符设置为“,”。我知道有quote函数,但它的作用正好相反。如何添加引号?您可以通过
格式化\u string
函数来实现
scala> val df = Seq(("1","Anna","London"),("2","Peter","Amsterdam")).toDF()
df: org.apache.spark.sql.DataFrame = [_1: string, _2: string ... 1 more field]
scala> df.show()
+---+-----+---------+
| _1| _2| _3|
+---+-----+---------+
| 1| Anna| London|
| 2|Peter|Amsterdam|
+---+-----+---------+
scala> val c = df.columns.map(df(_)).map((format_string("\"%s\"",_)))
c: Array[org.apache.spark.sql.Column] = Array(format_string("%s", _1), format_string("%s", _2), format_string("%s", _3))
scala> df.select(c:_*).toDF(df.columns:_*).show()
+---+-------+-----------+
| _1| _2| _3|
+---+-------+-----------+
|"1"| "Anna"| "London"|
|"2"|"Peter"|"Amsterdam"|
+---+-------+-----------+