Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/72.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/three.js/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Lappy()到spark.Lappy()的转换问题_R_Lapply_Sparkr - Fatal编程技术网

Lappy()到spark.Lappy()的转换问题

Lappy()到spark.Lappy()的转换问题,r,lapply,sparkr,R,Lapply,Sparkr,将R中的lappy()方法转换为spark.lappy()时出现问题。我的R代码是这样的 > lst <- lapply(1:(length(SampleData$A)-n), function(i) SampleData$A[i:(i+n)]) > names(lst) <- paste0("SampleData$A", seq_along(lst)) > list2env(lst, envir = .GlobalEnv) spark.lapply([(x1,

将R中的lappy()方法转换为spark.lappy()时出现问题。我的R代码是这样的

> lst <- lapply(1:(length(SampleData$A)-n), function(i) SampleData$A[i:(i+n)])
> names(lst) <- paste0("SampleData$A", seq_along(lst))
> list2env(lst, envir = .GlobalEnv)
spark.lapply([(x1, y1), (x2, y2), (x3, y3)], function(x) do_stuff(x[1], x[2]))
我对sparkR比较陌生,所以任何帮助都将不胜感激。谢谢大家!

在我使用
spark.lappy的(有限的)经验中,基本上你需要做的是确保你的名称空间是明确的尤其是使用外部软件包时

换句话说,您应该尝试明确表示
spark.lappy
需要在函数内部了解的任何其他类型的变量。虽然帮助文件说它通常从全局环境中获取信息,但这种方法允许您在不起作用时保持理智

在伪代码中,lappy应该是这样的

> lst <- lapply(1:(length(SampleData$A)-n), function(i) SampleData$A[i:(i+n)])
> names(lst) <- paste0("SampleData$A", seq_along(lst))
> list2env(lst, envir = .GlobalEnv)
spark.lapply([(x1, y1), (x2, y2), (x3, y3)], function(x) do_stuff(x[1], x[2]))
其中,
dou_stuff
不应依赖于自身环境之外的任何东西。根据我的经验,任何类型的选项,例如
option(na.pass)
都需要在函数中定义。手册还告诉您重新指定可能已加载的任何库

关于您的代码,我将对其进行如下修改:

count <- function(i, df2) {
  df2$Sepal.Length[i:(i+n)]
}

df2 <- iris
n = 3

# creating a new list of parameters as in the code example above
# this will be:
# [(integer, dataframe)]
input_list <- lapply(1:(length(df2$Sepal.Length)-n), function(x) return(list(i=x, df2=df2)))

# doing what you did above
lst <- lapply(input_list, function(x) count(x$i, x$df2))
splst <- spark.lapply(input_list, function(x) count(x$i, x$df2))
lst <- lapply(1:(length(df2$Sepal.Length)-n), function(x) count(x$i, df2))
splst <- spark.lapply(1:(length(df2$Sepal.Length)-n), function(x) count(x$i, df2))

它通常可以工作,但有时如果对象不是标准R类型(例如
xgb.Dmatrix
objects)

您可以发布示例数据吗?@ManikantaMaheshByra>lst$SampleData$A1 lst$SampleData$A2 lst$SampleData$A3