Scala 如何读取csv文件作为键值对的映射

Scala 如何读取csv文件作为键值对的映射,scala,Scala,我有csv文件中的数据,例如: value,key A,Name B,Name C,Name 24,Age 25,Age 20,Age M,Gender F,Gender 我想对其进行分析以生成以下映射: Map(Name -> List(A, B, C), Age -> List(24,25,20), Gender -> List(M,F)) 这是一种可能性: import scala.io.Source Source.fromFile("my/path") .ge

我有csv文件中的数据,例如:

value,key
A,Name
B,Name
C,Name
24,Age
25,Age
20,Age
M,Gender
F,Gender
我想对其进行分析以生成以下映射:

Map(Name -> List(A, B, C), Age -> List(24,25,20), Gender -> List(M,F))
这是一种可能性:

import scala.io.Source

Source.fromFile("my/path")
  .getLines()
  .drop(1) // Drop the header (first line)
  .map(_.split(",")) // Split by ",": List(Array(A, Name), Array(B, Name), Array(C, Name), ...
  .groupBy(_(1)) // group by value: Map(Age -> List(Array(24, Age), Array(25, Age), Array(20, Age)), ...
  .map{ case (key, values) => (key, values.map(_(0))) } // final format: Map(Age -> List(24, 25, 20), ...
其中:

Map(Age -> List(24, 25, 20), Name -> List(A, B, C), Gender -> List(M, F))

此代码将提供所需的输出

import scala.io.Source

Source.fromFile("C:\\src\\data.txt").getLines()
            .drop(1).map(_.split(",").toList) // gives each list like this -- List(A, Name)
            .map(x => (x.tail.head -> x.head)).toList // swap key and value places  -- (Name,A)
            .groupBy(_._1) // group by key -- (Age,List((Age,24), (Age,25), (Age,20)))
            .map(x => x._1 -> x._2.map(v => v._2)).toMap // extracting only values part -- Map(Age -> List(24, 25, 20), Name -> List(A, B, C), Gender -> List(M, F))

如果您不愿意在数据集上多次迭代,这里有一个单遍解决方案:

import scala.io.Source

val m = mutable.Map[String, List[String]]().withDefaultValue(List.empty)

Source.fromFile("my/path")
    .getLines()
    .drop(1)
    .map(_.split(","))
    .foreach { case x => m.put(x(1), x(0) :: m(x(1))) }
一场接一场:

scala> val doc = """A,Name
 | B,Name
 | C,Name
 | 24,Age
 | 25,Age
 | 20,Age
 | M,Gender
 | F,Gender""".stripMargin
doc: String =
A,Name
B,Name
C,Name
24,Age
25,Age
20,Age
M,Gender
F,Gender

scala> doc.split("\\n")
res0: Array[String] = Array(A,Name, B,Name, C,Name, 24,Age, 25,Age, 20,Age, M,Gender, F,Gender)

scala> res0.toList.map{ x => val line = x.split(","); line(1) -> line(0)}
res1: List[(String, String)] = List((Name,A), (Name,B), (Name,C), (Age,24), (Age,25), (Age,20), (Gender,M), (Gender,F))

scala> res1.groupBy(e => e._1)
res4: scala.collection.immutable.Map[String,List[(String, String)]] = Map(Age -> List((Age,24), (Age,25), (Age,20)), Name -> List((Name,A), (Name,B), (Name,C)), Gender -> List((Gender,M), (Gender,F)))

scala> res4.mapValues{x => x.map{case (k,v) => v}} 
res6: scala.collection.immutable.Map[String,List[String]] = Map(Age -> List(24, 25, 20), Name -> List(A, B, C), Gender -> List(M, F))

更实用的方法:

Source.fromFile("file.csv").getLines().drop(1).foldLeft(Map.empty[String, List[String]]){
    (acc, line) ⇒
      val value :: key :: Nil = line.split(",").toList
      acc + (key → (acc.getOrElse(key, List.empty) :+ value))
  }
这使得:

Map(Name -> List(A, B, C), Age -> List(24, 25, 20), Gender -> List(M, F))

此解决方案包括4个可以避免的强制转换(toList、toTuple、toList、toMap)。