如何使用java spark转换csv数据
我正在使用JavaSpark,我想知道是否有任何方法可以转换下面给出的示例数据如何使用java spark转换csv数据,java,csv,apache-spark,Java,Csv,Apache Spark,我正在使用JavaSpark,我想知道是否有任何方法可以转换下面给出的示例数据 Incremental Cost Number | Approver Names --------------------------------------------------------------------------------- S703401 |Ryan P Cassidy|Christopher J Mattin
Incremental Cost Number | Approver Names
---------------------------------------------------------------------------------
S703401 |Ryan P Cassidy|Christopher J Mattingly|Frank E
LaSota|Ryan P Cassidy|Anthony L Locricchio|Jason Monte
像这样的事情
Incremental Cost Number| Approver Names
-------------------------------------------
S703401 | Ryan P Cassidy
S703401 | Christopher J Mattingly
S703401 | Frank E LaSota
S703401 | Ryan P Cassidy
S703401 | Anthony L Locricchio
S703401 | Jason Monte
另外,我要导入的文件是一个逗号分隔的csv文件,只是一个特定列包含多个值,并由管道符号分隔。类似地,如果我有多个增量成本数的值。我认为需要将第二列除以“|”,然后使用explode()函数
注意:这是RDD做事的方式。在Scala和Dataframe中可能更容易实现
如果有多个列,可以执行如下操作
import org.apache.spark.sql.functions._
val df = Seq(("S703401","Ryan P Cassidy|Christopher J Mattingly|Frank E
LaSota|Ryan P Cassidy|Anthony L Locricchio|Jason
Monte","xyz|mnp|abc")).toDF("Incremental Cost Number","Approver
Names","3rd Column")
df.withColumn("Approver Names", explode(split(col("Approver Names"), "\\|")))
.withColumn("3rd Column", explode(split(col("3rd Column"), "\\|")))
.show()
+-----------------------+--------------------+-----------+
|Incremental Cost Number| Approver Names| 3rd Column|
+-----------------------+--------------------+-----------+
| S703401|Ryan P Cassidy|Ch...|xyz|mnp|abc|
+-----------------------+--------------------+-----------+
+-----------------------+--------------------+----------+
|Incremental Cost Number| Approver Names|3rd Column|
+-----------------------+--------------------+----------+
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Christopher J Mat...| xyz|
| S703401|Christopher J Mat...| mnp|
| S703401|Christopher J Mat...| abc|
| S703401| Frank E LaSota| xyz|
| S703401| Frank E LaSota| mnp|
| S703401| Frank E LaSota| abc|
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Anthony L Locricchio| xyz|
| S703401|Anthony L Locricchio| mnp|
| S703401|Anthony L Locricchio| abc|
| S703401| Jason Monte| xyz|
| S703401| Jason Monte| mnp|
| S703401| Jason Monte| abc|
+-----------------------+--------------------+----------+
您好@yyy-您能告诉我们您尝试了什么吗?Hi@mrblewog我被困在逻辑层,我不知道如何使用java spark进行操作。有没有办法使用RDD或数据集进行操作?如果我有两列,这很好,但是如果我必须以类似的方式添加,比如说4列,那么我仍然能够使用explode fn的功能吗?是的,您将看到不需要更多的列拷贝,但它可以工作。只需尝试一下,您就可以完整地解释步骤3。创建一个函数,该函数以逗号分隔的字符串作为输入,并输出包含标记的元组。将该函数传递给rdd.map
import org.apache.spark.sql.functions._
val df = Seq(("S703401","Ryan P Cassidy|Christopher J Mattingly|Frank E
LaSota|Ryan P Cassidy|Anthony L Locricchio|Jason
Monte","xyz|mnp|abc")).toDF("Incremental Cost Number","Approver
Names","3rd Column")
df.withColumn("Approver Names", explode(split(col("Approver Names"), "\\|")))
.withColumn("3rd Column", explode(split(col("3rd Column"), "\\|")))
.show()
+-----------------------+--------------------+-----------+
|Incremental Cost Number| Approver Names| 3rd Column|
+-----------------------+--------------------+-----------+
| S703401|Ryan P Cassidy|Ch...|xyz|mnp|abc|
+-----------------------+--------------------+-----------+
+-----------------------+--------------------+----------+
|Incremental Cost Number| Approver Names|3rd Column|
+-----------------------+--------------------+----------+
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Christopher J Mat...| xyz|
| S703401|Christopher J Mat...| mnp|
| S703401|Christopher J Mat...| abc|
| S703401| Frank E LaSota| xyz|
| S703401| Frank E LaSota| mnp|
| S703401| Frank E LaSota| abc|
| S703401| Ryan P Cassidy| xyz|
| S703401| Ryan P Cassidy| mnp|
| S703401| Ryan P Cassidy| abc|
| S703401|Anthony L Locricchio| xyz|
| S703401|Anthony L Locricchio| mnp|
| S703401|Anthony L Locricchio| abc|
| S703401| Jason Monte| xyz|
| S703401| Jason Monte| mnp|
| S703401| Jason Monte| abc|
+-----------------------+--------------------+----------+