Machine learning StandardScaler返回NaN值
我试图用Pyspark和MLib建立一个线性回归模型来预测股票的收盘价。模式如下所示Machine learning StandardScaler返回NaN值,machine-learning,pyspark,apache-spark-ml,Machine Learning,Pyspark,Apache Spark Ml,我试图用Pyspark和MLib建立一个线性回归模型来预测股票的收盘价。模式如下所示 root |-- Date: timestamp (nullable = true) |-- Open: double (nullable = true) |-- High: double (nullable = true) |-- Low: double (nullable = true) |-- Close: double (nullable = true) |-- Adj Close: dou
root
|-- Date: timestamp (nullable = true)
|-- Open: double (nullable = true)
|-- High: double (nullable = true)
|-- Low: double (nullable = true)
|-- Close: double (nullable = true)
|-- Adj Close: double (nullable = true)
|-- Volume: double (nullable = true)
我为输入创建了一个包含Open、High、Low、Adj Close和Volume属性的DenseVector,并将其传递给StandardScaler
+--------+----------------------------------------------+
|target |features |
+--------+----------------------------------------------+
|2.77212 |[2.83162,3.53661,2.52112,2.77212,164329.0] |
|0.753325|[2.79376,2.79881,0.714725,0.753325,674188.0] |
|0.701897|[0.706136,0.87981,0.629191,0.701897,532170.0] |
|0.708448|[0.713989,0.729854,0.636546,0.708448,405283.0]|
|1.06786 |[0.708087,1.13141,0.663235,1.06786,1463100.0] |
+--------+----------------------------------------------+
平均值和标准值均打印为nan
[nan,nan,nan,nan,nan]
[nan,nan,nan,nan,nan]
在SO()上有一个类似的问题,建议更改数据类型,因此我现在使用double而不是float。此外,数据集中没有空值
有人能解释一下这里出了什么问题以及什么是正确的方法吗?无法用Spark 3.0.1重现您的问题-我得到了
scaler.mean
和scaler.std
的正常值。
[nan,nan,nan,nan,nan]
[nan,nan,nan,nan,nan]