在LINUX中使用多核运行R-script
我有一个名为“my_function”的函数,我想在LINUX中使用多核并行运行mcapply。我有10个cty_id的20年时间来为每个cty_id运行。我如何使用McApp使用4个内核来实现这一快速运行?我已经测试了我的功能,每次运行一个县和一年,它运行良好。但是,我想加快这个过程,而不是一次一个地手动更改年份和年份在LINUX中使用多核运行R-script,r,performance,parallel-processing,R,Performance,Parallel Processing,我有一个名为“my_function”的函数,我想在LINUX中使用多核并行运行mcapply。我有10个cty_id的20年时间来为每个cty_id运行。我如何使用McApp使用4个内核来实现这一快速运行?我已经测试了我的功能,每次运行一个县和一年,它运行良好。但是,我想加快这个过程,而不是一次一个地手动更改年份和年份 cty_id <- c(205,15,37,59,25,133,11,23,21,19) val_yr <- c(1998:2017) my_function &
cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(cty_id,val_yr) {
<do something here> ()
}
有人能帮我跑快点吗
请让我知道我需要定义哪些全局变量
修订后的R.代码文件如下
in_file11 <- 'PLSS_KS_All_1999_2017.txt'
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt'
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'
in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)
in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))
in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv'
in_file3 <- 'All_counties_1999_2017.csv'
in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)
years <- c(1998:2017)
st_id <- c(15)
crop_id <- c(11)
my_function <- function(cty_id,val_yr) {
<do something here> ()
}
registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)
plan(cluster, workers = cluztrr)
county_id <- c(19,205)
val_year <- c(1998:1999)
foo <- expand.grid(county_id,val_year)
foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
my_function(foo[i,]$Var1,foo[i,]$Var2)
}
stopCluster(cluztrr)
Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): st_id, crop_id
在_file11现在是未来:-)
在R
中使用future
包进行并行计算doFuture
是循环的子包(foreach
loops)
库(doFuture)
注册基金()
计划(多进程)
谢谢。我已经修改了代码并运行了,但是我得到了这个错误(请参见上文)user3408139您必须识别正在导出的对象,并按照说明定义它们,例如:globals=c(“a”,“slow_sum”)
在源代码中,您可以看到这个问题来自太大的对象大小OK。你能告诉我那我该怎么办吗。?我对R编程比较陌生。在R会话中应该有大型对象。在运行代码重启会话之前。然后尝试确定是什么对象导致了这种情况。你有大矩阵或数据帧吗?尝试globals=c(“largeMATRIX”、“largeDF”)
我正在读取大型文本文件和数据帧。请看我的整个R脚本上面。这就是我在LINUX中运行的。当你说“largeMATRIX”时,我必须在读取文本文件后将所有数据帧定义为全局。请帮助我知道如何运行此脚本。再次感谢。你不需要那么多包裹。只有data.table
用于fread
和调用lr\u pass
的内容。什么时候出错?在哪一行之后?
in_file11 <- 'PLSS_KS_All_1999_2017.txt'
in_file12 <- 'PLSS_KS_All_WeeklySM_1998_2017_BILINEAR.txt'
in_file13 <- 'PRISM_WeeklyPrcp_Sum_800m_1998_2017_BILINEAR.txt'
in_data11 <- fread(in_file11,drop = 1)
in_data12 <- fread(in_file12,drop = 1)
in_data13 <- fread(in_file13,drop = 1)
in_datan <- as.data.table(full_join(in_data12, in_data13))
in_data1 <- as.data.table(full_join(in_data11, in_datan))
in_file2 <- 'KS_pp_Wheat_hist_YieldID_1998_2017.csv'
in_file3 <- 'All_counties_1999_2017.csv'
in_data2 <- fread(in_file2)
in_data3 <- fread(in_file3)
years <- c(1998:2017)
st_id <- c(15)
crop_id <- c(11)
my_function <- function(cty_id,val_yr) {
<do something here> ()
}
registerDoFuture()
plan(multiprocess)
num.cores <- detectCores()-1
cluztrr <- makeCluster(num.cores)
registerDoParallel(cl = cluztrr)
plan(cluster, workers = cluztrr)
county_id <- c(19,205)
val_year <- c(1998:1999)
foo <- expand.grid(county_id,val_year)
foreach(i = 1:nrow(foo), globals = c("in_data1","in_data2","in_data3"), .export = c("years","st_id","crop_id")) %dopar% {
my_function(foo[i,]$Var1,foo[i,]$Var2)
}
stopCluster(cluztrr)
Error in { : task 1 failed - "object 'in_data1' not found"
In addition: Warning message:
In e$fun(obj, substitute(ex), parent.frame(), e$data) :
already exporting variable(s): st_id, crop_id
library(doFuture)
registerDoFuture()
plan(multiprocess)
cty_id <- c(205,15,37,59,25,133,11,23,21,19)
val_yr <- c(1998:2017)
my_function <- function(X,Y) {
cat(X, Y, "\n")
}
result <- foreach(i = cty_id) %dopar% {
foreach(j = val_yr) %do% {
my_function(i, j)
}
}
A <- c(205, 15, 37, 59, 25, 133, 11, 23, 21, 19)
B <- c(1998:2017)
foo <- expand.grid(A, B)
myFunction <- function(X, Y) {
cat(X, Y, "\n")
}
foreach(i = 1:nrow(foo)) %dopar% {
my_function(foo[i, ]$Var1, foo[i, ]$Var2)
}