Java 如何将sdf_predict（）与库中ml_pca（）提供的模型一起使用（Sparkyr）_Java_R_Dplyr_Apache Spark Mllib

Java 如何将sdf_predict（）与库中ml_pca（）提供的模型一起使用（Sparkyr）

java r

Java 如何将sdf_predict（）与库中ml_pca（）提供的模型一起使用（Sparkyr）,java,r,dplyr,apache-spark-mllib,Java,R,Dplyr,Apache Spark Mllib,我得到一个pca模型 > library(sparklyr) > library(dplyr) > sc <- spark_connect("local", version="2.0.0") > iris_tbl <- copy_to(sc, iris, "iris", overwrite = TRUE) The following columns have been renamed: - 'Sepal.Length' => 'Sepal_Length'

我得到一个pca模型

> library(sparklyr)
> library(dplyr)
> sc <- spark_connect("local", version="2.0.0")
> iris_tbl <- copy_to(sc, iris, "iris", overwrite = TRUE)
The following columns have been renamed:
- 'Sepal.Length' => 'Sepal_Length' (#1)
- 'Sepal.Width'  => 'Sepal_Width'  (#2)
- 'Petal.Length' => 'Petal_Length' (#3)
- 'Petal.Width'  => 'Petal_Width'  (#4)
> pca_model <- tbl(sc, "iris") %>%
+   select(-Species) %>%
+   ml_pca()
> print(pca_model)
Explained variance:

       PC1         PC2         PC3         PC4 
0.924618723 0.053066483 0.017102610 0.005212184 

Rotation:
                     PC1         PC2         PC3        PC4
Sepal_Length -0.36138659 -0.65658877  0.58202985  0.3154872
Sepal_Width   0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal_Length -0.85667061  0.17337266 -0.07623608 -0.4798390
Petal_Width  -0.35828920  0.07548102 -0.54583143  0.7536574

以一个错误结束

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

插入预测数据没有帮助

sdf_predict(pca_model, tbl(sc, "iris") %>% select(-Species))

Source:   query [?? x 5]
Database: spark connection master=local[4] app=sparklyr local=TRUE

以一个错误结束

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

java.lang.IllegalArgumentException: requirement failed: 
The columns of A don't match the number of elements of x. A: 4, x: 0

通常可以使用PCA来预测spark？

而不是

sdf\u predict

，使用

sdf\u project

> pca_projected <- sdf_project(pca_model, tbl(sc, "iris") %>% select(-Species), 
+                              features=rownames(pca_model$components))
> pca_projected %>% collect %>% head
# A tibble: 6 x 8
  Sepal_Length Sepal_Width Petal_Length Petal_Width   PC1   PC2   PC3     PC4
         <dbl>       <dbl>        <dbl>       <dbl> <dbl> <dbl> <dbl>   <dbl>
1         5.10        3.50         1.40       0.200 -2.82 -5.65 0.660 -0.0311
2         4.90        3.00         1.40       0.200 -2.79 -5.15 0.842  0.0657
3         4.70        3.20         1.30       0.200 -2.61 -5.18 0.614 -0.0134
4         4.60        3.10         1.50       0.200 -2.76 -5.01 0.600 -0.109 
5         5.00        3.60         1.40       0.200 -2.77 -5.65 0.542 -0.0946
6         5.40        3.90         1.70       0.400 -3.22 -6.07 0.463 -0.0576

>pca\u预测百分比选择（-Species），
+功能=行名（pca_模型$components））
>pca\u预计%>%收集%>%人头
#一个tibble:6x8
萼片长度萼片宽度花瓣长度花瓣宽度PC1 PC2 PC3 PC4
1         5.10        3.50         1.40       0.200 -2.82 -5.65 0.660 -0.0311
2         4.90        3.00         1.40       0.200 -2.79 -5.15 0.842  0.0657
3         4.70        3.20         1.30       0.200 -2.61 -5.18 0.614 -0.0134
4         4.60        3.10         1.50       0.200 -2.76 -5.01 0.600 -0.109 
5         5.00        3.60         1.40       0.200 -2.77 -5.65 0.542 -0.0946
6         5.40        3.90         1.70       0.400 -3.22 -6.07 0.463 -0.0576