获取Scala Spark中括号之间的单词和值

获取Scala Spark中括号之间的单词和值,scala,matrix,apache-spark,Scala,Matrix,Apache Spark,以下是我的数据: doc1: (Does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (Macbook,1) doc2: (Pro,1) (G4,-1) (13inch,0) (laptop,1) doc3: (Only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1) etc... 我想提取单词和值,然后将

以下是我的数据:

doc1: (Does,1) (just,-1) (what,0) (was,1) (needed,1) (to,0) (charge,1) (the,0) (Macbook,1)
doc2: (Pro,1) (G4,-1) (13inch,0) (laptop,1)
doc3: (Only,1) (beef,0) (was,1) (it,0) (no,-1) (longer,0) (lights,-1) (up,0) (the,-1)
etc...
我想提取单词和值,然后将它们存储在两个分开的矩阵中,矩阵1是(docID单词),矩阵2是(docID值)

给出:

List(
  (doc1 -> Does,just,what,was,needed,to,charge,the,Macbook),
  (doc2 -> Pro,G4,13inch,laptop),
  (doc3 -> Only,beef,was,it,no,longer,lights,up,the)
)

List(
  (doc1 -> 1,-1,0,1,1,0,1,0,1), 
  (doc2 -> 1,-1,0,1), 
  (doc3 -> 1,0,1,0,-1,0,-1,0,-1)
)

到目前为止,你的工作方法/计划是什么?所有这些数据都在一个文件中吗?或者每一行都在一个单独的文件中?嗨,我在ItelliJ IDEA上运行它,但它有一些错误,我也导入了(import breeze.linalg.Axis.u),但它不能像编译器不喜欢速记一样工作。更新。
val inputText = sc.textFile("input.txt")
var digested = input.map(line => line.split(":"))
        .map(row => row(0) -> row(1).trim.split(" "))
        .map(row => row._1 -> row._2.map(_.stripPrefix("(").stripSuffix(")").trim.split(",")))

var matrix_1 = digested.map(row => row._1 -> row._2.map( a => a(0)))
var matrix_2 = digested.map(row => row._1 -> row._2.map( a => a(1)))
List(
  (doc1 -> Does,just,what,was,needed,to,charge,the,Macbook),
  (doc2 -> Pro,G4,13inch,laptop),
  (doc3 -> Only,beef,was,it,no,longer,lights,up,the)
)

List(
  (doc1 -> 1,-1,0,1,1,0,1,0,1), 
  (doc2 -> 1,-1,0,1), 
  (doc3 -> 1,0,1,0,-1,0,-1,0,-1)
)