Scala 如何将数据帧转换为标签特征向量?
我正在scala中运行逻辑回归模型,我的数据框架如下: df 我需要把它变成这样Scala 如何将数据帧转换为标签特征向量?,scala,apache-spark,machine-learning,Scala,Apache Spark,Machine Learning,我正在scala中运行逻辑回归模型,我的数据框架如下: df 我需要把它变成这样 +-----+------------------+ |label| features | +-----+------------------+ | 0.0|(1,[1],[0]) | | 0.0|(1,[1],[33]) | | 0.0|(1,[1],[58]) | | 0.0|(1,[1],[96]) | | 0.0|(1,[1],[1])
+-----+------------------+
|label| features |
+-----+------------------+
| 0.0|(1,[1],[0]) |
| 0.0|(1,[1],[33]) |
| 0.0|(1,[1],[58]) |
| 0.0|(1,[1],[96]) |
| 0.0|(1,[1],[1]) |
| 1.0|(1,[1],[21]) |
| 0.0|(1,[1],[10]) |
| 0.0|(1,[1],[65]) |
| 1.0|(1,[1],[7]) |
| 1.0|(1,[1],[28]) |
+-----------+------------+
我试过了
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val assembler = new VectorAssembler()
.setInputCols(Array("x"))
.setOutputCol("Feature")
var lrModel= lr.fit(daf.withColumnRenamed("x","label").withColumnRenamed("y","features"))
感谢您的帮助。鉴于
数据帧
+---+---+
|x |y |
+---+---+
|0 |0 |
|0 |33 |
|0 |58 |
|0 |96 |
|0 |1 |
|1 |21 |
|0 |10 |
|0 |65 |
|1 |7 |
|1 |28 |
+---+---+
并按以下步骤操作
val assembler = new VectorAssembler()
.setInputCols(Array("x", "y"))
.setOutputCol("features")
val output = assembler.transform(df).select($"x".cast(DoubleType).as("label"), $"features")
output.show(false)
我会给你一个结果
+-----+----------+
|label|features |
+-----+----------+
|0.0 |(2,[],[]) |
|0.0 |[0.0,33.0]|
|0.0 |[0.0,58.0]|
|0.0 |[0.0,96.0]|
|0.0 |[0.0,1.0] |
|1.0 |[1.0,21.0]|
|0.0 |[0.0,10.0]|
|0.0 |[0.0,65.0]|
|1.0 |[1.0,7.0] |
|1.0 |[1.0,28.0]|
+-----+----------+
现在使用LogisticRegression
将很容易
val lr = new LogisticRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
val lrModel = lr.fit(output)
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
您将有如下输出:
Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717
它抛出一个错误向量汇编程序不支持字符串类型。你知道它为什么抛出这个错误吗?顺便说一句,格式也不同于我给模型提供的格式。你从哪里得到的trip\u status
?抱歉,这实际上是x only。我不能直接导入sqlContext.implicits.
。当我搜索时,它显示我无法在spark 2.0.0中执行此操作。您如何创建sqlContext?
Coefficients: [1.5672602877378823,0.0] Intercept: -1.4055020984891717