Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/haskell/8.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Spark R 2.0速度非常慢_R_Apache Spark_Sparkr - Fatal编程技术网

Spark R 2.0速度非常慢

Spark R 2.0速度非常慢,r,apache-spark,sparkr,R,Apache Spark,Sparkr,我刚开始测试Spark R2.0,发现dapply的执行速度非常慢 例如,下面的代码 set.seed(2) random_DF<-data.frame(matrix(rnorm(1000000),100000,10)) system.time(dummy_res<-random_DF[random_DF[,1]>1,]) user system elapsed 0.005 0.000 0.006 ` set.seed(2) 随机分布 sparkR.sessio

我刚开始测试Spark R2.0,发现dapply的执行速度非常慢

例如,下面的代码

set.seed(2)
random_DF<-data.frame(matrix(rnorm(1000000),100000,10))
system.time(dummy_res<-random_DF[random_DF[,1]>1,])

user  system elapsed 
0.005   0.000   0.006 `
set.seed(2)
随机分布
sparkR.session(master = "local[4]")

random_DF_Spark <- repartition(createDataFrame(random_DF),4)

subset_DF_Spark <- dapply(
    random_DF_Spark,
    function(x) {
        y <- x[x[1] > 1, ]
        y
    },
    schema(random_DF_Spark))

system.time(dummy_res_Spark<-collect(subset_DF_Spark))

user  system elapsed 
2.003   0.119  62.919