Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用SparkR中的其他代码替换gapply函数_R_Apache Spark_Sparkr - Fatal编程技术网

用SparkR中的其他代码替换gapply函数

用SparkR中的其他代码替换gapply函数,r,apache-spark,sparkr,R,Apache Spark,Sparkr,如何替换SparkR中的gapply函数?我想聚合一些数据并对聚合数据应用一个函数,我不需要collect()函数,数据不会返回到驱动程序机器 我有一个DatraFrameSpark: CNPJ PID DATA N 23104577000149 7898586660649 2016-02-01 2 23104577000149 7898132542078 2016-02-01

如何替换SparkR中的
gapply
函数?我想聚合一些数据并对聚合数据应用一个函数,我不需要collect()函数,数据不会返回到驱动程序机器

我有一个
DatraFrame
Spark:

       CNPJ              PID              DATA     N
 23104577000149   7898586660649     2016-02-01     2
 23104577000149   7898132542078     2016-02-01     2
 11660954000147   7898944830295     2016-02-01     2
 10140281000131   7896496920747     2016-02-01     1
 23104577000149   7891772150900     2016-02-01     1
 10140281000131   789895720413854   2016-01-31     1
我想对CNPJ和PID字段进行聚合

像这样:

schema <- structType(structField("CNPJ", "string"),
               structField("PID", "string"),
               structField("DATA", "date"), 
               structField("N", "double"))

result <- gapply(
    ds_filtered,
    c("CNPJ", "PID"),
    function(key, x)
    { 
     dts <- data.frame(key, DATA = seq(min(as.Date(x$DATA)), as.Date(e_date), "days")) 
     colnames(dts)[c(1, 2)] <- c("CNPJ", "PID") 
     y <- data.frame(key, DATA = as.Date(x$DATA), N = x$N) 
     colnames(y)[c(1, 2)] <- c("CNPJ", "PID") 
     y <- dplyr::left_join(dts, y, by = c("CNPJ", "PID", "DATA")) 
     y[is.na(y$N), 4] <- 0 
     data.frame(CNPJ = as.character(y$CNPJ), 
                PID = as.character(y$PID), 
                DATA = as.Date(y$DATA), 
                N = y$N) 
   },
   schema
)

schema你好,zero323!!我用一个例子编辑了我的问题。谢谢!在从gapplay()返回的数据帧中添加StringsAsFactor=FALSE Hello zero323!!我用一个例子编辑了我的问题。谢谢!在从gapplay()返回的数据帧中添加StringsAsFactor=FALSE