Scala 如何将数据帧转换为标签特征向量？_Scala_Apache Spark_Machine Learning

Scala 如何将数据帧转换为标签特征向量？

scala apache-spark machine-learning

Scala 如何将数据帧转换为标签特征向量？,scala,apache-spark,machine-learning,Scala,Apache Spark,Machine Learning,我正在scala中运行逻辑回归模型，我的数据框架如下： df 我需要把它变成这样 +-----+------------------+ |label| features | +-----+------------------+ | 0.0|(1,[1],[0]) | | 0.0|(1,[1],[33]) | | 0.0|(1,[1],[58]) | | 0.0|(1,[1],[96]) | | 0.0|(1,[1],[1])

我正在scala中运行逻辑回归模型，我的数据框架如下：

我需要把它变成这样

+-----+------------------+
|label|      features    | 
+-----+------------------+
|  0.0|(1,[1],[0])       |
|  0.0|(1,[1],[33])      |
|  0.0|(1,[1],[58])      |
|  0.0|(1,[1],[96])      |
|  0.0|(1,[1],[1])       |
|  1.0|(1,[1],[21])      |
|  0.0|(1,[1],[10])      |
|  0.0|(1,[1],[65])      |
|  1.0|(1,[1],[7])       |
|  1.0|(1,[1],[28])      | 
+-----------+------------+

我试过了

 val lr = new LogisticRegression()
           .setMaxIter(10)
           .setRegParam(0.3)
           .setElasticNetParam(0.8)

      val assembler = new VectorAssembler()
  .setInputCols(Array("x"))
  .setOutputCol("Feature")
  var lrModel=  lr.fit(daf.withColumnRenamed("x","label").withColumnRenamed("y","features"))

感谢您的帮助。

鉴于

数据帧
+---+---+
|x  |y  |
+---+---+
|0  |0  |
|0  |33 |
|0  |58 |
|0  |96 |
|0  |1  |
|1  |21 |
|0  |10 |
|0  |65 |
|1  |7  |
|1  |28 |
+---+---+

并按以下步骤操作
val assembler =  new VectorAssembler()
  .setInputCols(Array("x", "y"))
  .setOutputCol("features")

  val output = assembler.transform(df).select($"x".cast(DoubleType).as("label"), $"features")
output.show(false)

我会给你一个结果
+-----+----------+
|label|features  |
+-----+----------+
|0.0  |(2,[],[]) |
|0.0  |[0.0,33.0]|
|0.0  |[0.0,58.0]|
|0.0  |[0.0,96.0]|
|0.0  |[0.0,1.0] |
|1.0  |[1.0,21.0]|
|0.0  |[0.0,10.0]|
|0.0  |[0.0,65.0]|
|1.0  |[1.0,7.0] |
|1.0  |[1.0,28.0]|
+-----+----------+

现在使用LogisticRegression
将很容易
val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

val lrModel = lr.fit(output)
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

您将有如下输出：
Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717

它抛出一个错误向量汇编程序不支持字符串类型。你知道它为什么抛出这个错误吗？顺便说一句，格式也不同于我给模型提供的格式。你从哪里得到的trip\u status
？抱歉，这实际上是x only。我不能直接导入sqlContext.implicits.。当我搜索时，它显示我无法在spark 2.0.0中执行此操作。您如何创建sqlContext？
Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717