在windows中使用sparkr构建glm模型,但速度非常慢,并且在执行R代码时出错

在windows中使用sparkr构建glm模型,但速度非常慢,并且在执行R代码时出错,r,apache-spark,glm,sparkr,R,Apache Spark,Glm,Sparkr,数据集很大,包含30列和200000条记录。我正在使用sparkr构建glm模型,但模型执行花费了太多时间,而且也会出错。。如何使用Sparkr减少模型构建时间并解决下面给出的错误。请给我一些建议来改进这段代码 R代码: 点燃火花 Sys.setenv(SPARK_HOME="C:/spark/spark-2.0.0-bin-hadoop2.7") 设置库路径 .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPath

数据集很大,包含30列和200000条记录。我正在使用sparkr构建glm模型,但模型执行花费了太多时间,而且也会出错。。如何使用Sparkr减少模型构建时间并解决下面给出的错误。请给我一些建议来改进这段代码

R代码: 点燃火花

Sys.setenv(SPARK_HOME="C:/spark/spark-2.0.0-bin-hadoop2.7")
设置库路径

.libPaths(c(file.path(Sys.getenv("SPARK_HOME"),"R","lib"), .libPaths()))

Sys.setenv(JAVA_HOME="C:/Program Files/Java/jdk1.7.0_71")
加载SparkR库

library(SparkR)
library(rJava)

sc <- sparkR.session(enableHiveSupport = FALSE,master = "local[*]",appName = "SparkR-Modi",sparkConfig = list(spark.sql.warehouse.dir="file:///c:/tmp/spark-warehouse"))
sqlContext <- sparkRSQL.init(sc)
spdf <- read.df(sqlContext, "C:/Users/prasann/Desktop/V/bigdata11.csv", source = "com.databricks.spark.csv", header = "true")
showDF(spdf)
库(SparkR)
图书馆(rJava)

任何人都可以帮助…你们可以提供一个产生这个错误的数据样本,使其可复制。任何人都可以帮助…你们可以提供一个产生这个错误的数据样本,使其可复制。
md <- glm(NP_OfferCurrentResponse ~., family = "binomial", data = spdf)
> md <- glm(NP_OfferCurrentResponse ~., family = "binomial", data = spdf)
Error in invokeJava(isStatic = TRUE, className, methodName, ...) : 
java.lang.AssertionError: assertion failed: lapack.dppsv returned 226.
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.mllib.linalg.CholeskyDecomposition$.solve(CholeskyDecomposition.scala:40)
at org.apache.spark.ml.optim.WeightedLeastSquares.fit(WeightedLeastSquares.scala:140)
at org.apache.spark.ml.regression.GeneralizedLinearRegression$FamilyAndLink.initialize(GeneralizedLinearRegression.scala:340)
at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:275)
at org.apache.spark.ml.regression.GeneralizedLinearRegression.train(GeneralizedLinearRegression.scala:139)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:90)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:71)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:149)
at org.apache.spark.ml.Pipeline$$anonfun$fit$2.apply(Pipeline.scala:145)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.c