Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在LINUX中使用多核运行R-script_R_Performance_Parallel Processing - Fatal编程技术网

在LINUX中使用多核运行R-script

在LINUX中使用多核运行R-script,r,performance,parallel-processing,R,Performance,Parallel Processing,我有一个名为“my_function”的函数,我想在LINUX中使用多核并行运行mcapply。我有10个cty_id的20年时间来为每个cty_id运行。我如何使用McApp使用4个内核来实现这一快速运行?我已经测试了我的功能,每次运行一个县和一年,它运行良好。但是,我想加快这个过程,而不是一次一个地手动更改年份和年份 cty_id <- c(205,15,37,59,25,133,11,23,21,19) val_yr <- c(1998:2017) my_function &

我有一个名为“my_function”的函数,我想在LINUX中使用多核并行运行mcapply。我有10个cty_id的20年时间来为每个cty_id运行。我如何使用McApp使用4个内核来实现这一快速运行?我已经测试了我的功能,每次运行一个县和一年,它运行良好。但是,我想加快这个过程,而不是一次一个地手动更改年份和年份

cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)

my_function <- function(cty_id,val_yr) {

<do something here> ()

}
有人能帮我跑快点吗

请让我知道我需要定义哪些全局变量

修订后的R.代码文件如下
in_file11 <- 'PLSS_KS_All_1999_2017.txt' 
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt' 
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'

in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)

in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))

in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv' 
in_file3 <- 'All_counties_1999_2017.csv'

in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)

years <- c(1998:2017)
st_id <- c(15)  
crop_id <- c(11)

my_function <- function(cty_id,val_yr) {

<do something here> ()

}


registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)

plan(cluster, workers = cluztrr)


county_id <- c(19,205)
val_year <- c(1998:1999)

foo <- expand.grid(county_id,val_year)


foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
  my_function(foo[i,]$Var1,foo[i,]$Var2)
}

stopCluster(cluztrr)

Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
  already exporting variable(s): st_id, crop_id
在_file11现在是未来:-)

R
中使用
future
包进行并行计算
doFuture
是循环的子包(
foreach
loops)

库(doFuture)
注册基金()
计划(多进程)

谢谢。我已经修改了代码并运行了,但是我得到了这个错误(请参见上文)user3408139您必须识别正在导出的对象,并按照说明定义它们,例如:
globals=c(“a”,“slow_sum”)
在源代码中,您可以看到这个问题来自太大的对象大小OK。你能告诉我那我该怎么办吗。?我对R编程比较陌生。在R会话中应该有大型对象。在运行代码重启会话之前。然后尝试确定是什么对象导致了这种情况。你有大矩阵或数据帧吗?尝试
globals=c(“largeMATRIX”、“largeDF”)
我正在读取大型文本文件和数据帧。请看我的整个R脚本上面。这就是我在LINUX中运行的。当你说“largeMATRIX”时,我必须在读取文本文件后将所有数据帧定义为全局。请帮助我知道如何运行此脚本。再次感谢。你不需要那么多包裹。只有
data.table
用于
fread
和调用
lr\u pass
的内容。什么时候出错?在哪一行之后?
in_file11 <- 'PLSS_KS_All_1999_2017.txt' 
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt' 
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'

in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)

in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))

in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv' 
in_file3 <- 'All_counties_1999_2017.csv'

in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)

years <- c(1998:2017)
st_id <- c(15)  
crop_id <- c(11)

my_function <- function(cty_id,val_yr) {

<do something here> ()

}


registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)

plan(cluster, workers = cluztrr)


county_id <- c(19,205)
val_year <- c(1998:1999)

foo <- expand.grid(county_id,val_year)


foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
  my_function(foo[i,]$Var1,foo[i,]$Var2)
}

stopCluster(cluztrr)

Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
  already exporting variable(s): st_id, crop_id
library(doFuture)
registerDoFuture()
plan(multiprocess)

cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(X,Y) {
    cat(X, Y, "\n")
}

result <- foreach(i = cty_id) %dopar% {
    foreach(j = val_yr) %do% {
        my_function(i, j)
    }
}
A <- c(205, 15, 37, 59, 25, 133, 11, 23, 21, 19)
B <- c(1998:2017)
foo <- expand.grid(A, B)
myFunction <- function(X, Y) {
    cat(X, Y, "\n")
}
foreach(i = 1:nrow(foo)) %dopar% {
    my_function(foo[i, ]$Var1, foo[i, ]$Var2)
}