Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/69.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:如何在R3.2.1中使用晶格并行化多面板打印?_R_Parallel Processing_Bioinformatics_Lattice_Doparallel - Fatal编程技术网

R:如何在R3.2.1中使用晶格并行化多面板打印?

R:如何在R3.2.1中使用晶格并行化多面板打印?,r,parallel-processing,bioinformatics,lattice,doparallel,R,Parallel Processing,Bioinformatics,Lattice,Doparallel,我是R编程新手,想知道如何在由latticepackage制作的12个网格对象上并行运行plot 基本上,经过大量预处理步骤后,我有以下命令: plot(adhd_plot, split = c(1,1,4,3)) #plot adhd trellis object at 1,1 in a grid of 4 by 3 i.e 4 COLUMNS x 3 ROWS plot(bpd_plot, split = c(2,1,4,3), newpage = F) #plot bpd trellis

我是R编程新手,想知道如何在由
lattice
package制作的12个网格对象上并行运行
plot

基本上,经过大量预处理步骤后,我有以下命令:

plot(adhd_plot, split = c(1,1,4,3)) #plot adhd trellis object at 1,1 in a grid of 4 by 3 i.e 4 COLUMNS x 3 ROWS
plot(bpd_plot, split = c(2,1,4,3), newpage = F) #plot bpd trellis object in 2nd Column in a grid of 4colx3row
plot(bmi_plot, split = c(3,1,4,3), newpage = F) 
plot(dbp_plot, split = c(4,1,4,3), newpage = F) 
plot(height_plot, split = c(1,2,4,3), newpage = F) 
plot(hdl_plot, split = c(2,2,4,3), newpage = F) 
plot(ldl_plot, split = c(3,2,4,3), newpage = F) 
plot(ra_plot, split = c(4,2,4,3), newpage = F) 
plot(sbp_plot, split = c(1,3,4,3), newpage = F) 
plot(scz_plot, split = c(2,3,4,3), newpage = F) 
plot(tc_plot, split = c(3,3,4,3), newpage = F) 
plot(tg_plot, split = c(4,3,4,3), newpage = F) 
问题是,尽管上述命令有效,但在Mac OSX上它们需要很长时间(>4小时)才能生成如下图形:

由于我的Mac有8个内核,我想我应该尝试在不同的内核之间分割plot命令,以加快绘图速度

在搜索其他并行化问题后,我找到了
doParallel
包,并认为我可以在其中实现
parlappy
函数,如下所示:

library(doParallel)
detectCores()
cl <- makeCluster(6) #6 out of 8 cores
registerdoParallel(cl)
parLapply(cl, list_of_all_trellis_objects, plot)
库(双并行)
detectCores()

cl正如评论中所建议的,无法并行写入绘图设备

加快绘制单个绘图的一些变通方法:

  • 减少QQ图中的点数,请参见:

  • 通过应用以下提示,可以更快地加载数据:

  • 您可以尝试并行绘制/保存多个绘图(其中每个绘图使用第1点和第2点的方法),但写入磁盘可能会造成严重的瓶颈

  • 编辑:

    下面是绘制快速qq图的粗略代码:

    代码如下:

    find_conf_intervals = function(row){
      i = row[1]
      len = row[2]
      if (i < 10000 | i %% 100 == 0){
        return(c(-log10(qbeta(0.95,i,len-i+1)), -log10(qbeta(0.05,i,len-i+1))))
      } else { # Speed up
        return(c(NA,NA))
      }
    }
    
    confidence.intervals <- function(e){
      xspace = 0.078
      print("1")
      ci = apply(cbind( 1:length(e), rep(length(e),length(e))), MARGIN=1, FUN=find_conf_intervals)
      print("2")
      bks = append(seq(10000,length(e),100),length(e)+1)
      print("3")
      for (i in 1:(length(bks)-1)){
        ci[1, bks[i]:(bks[i+1]-1)] = ci[1, bks[i]]
        ci[2, bks[i]:(bks[i+1]-1)] = ci[2, bks[i]]
      }
      colnames(ci) = names(e)
      ## Extrapolate to make plotting prettier (doesn't affect intepretation at data points)
      slopes = c((ci[1,1] - ci[1,2]) / (e[1] - e[2]), (ci[2,1] - ci[2,2]) / (e[1] - e[2]))
      print("4")
      extrap_x = append(e[1]+xspace,e) ## extrapolate slightly for plotting purposes only
      extrap_y = cbind( c(ci[1,1] + slopes[1]*xspace, ci[2,1] + slopes[2]*xspace), ci)
      print("5")
      polygon(c(extrap_x, rev(extrap_x)), c(extrap_y[1,], rev(extrap_y[2,])),
              col = "grey81", border = "grey81")
    }
    
    quant.subsample <- function(y, m=100, e=1) {
      ## m: size of a systematic sample
      ## e: number of extreme values at either end to use
      x <- sort(y)
      n <- length(x)
      quants <- (1 + sin(1:m / (m+1) * pi - pi/2))/2
      sort(c(x[1:e], quantile(x, probs=quants), x[(n+1-e):n]))
      ## Returns m + 2*e sorted values from the EDF of y
    }
    
    get.points <- function(pv) {
      suppressWarnings(as.numeric(pv))
      names(d) = names(pv)
      d = d[!is.na(d)]
      d = d[d>0 & d<1]
      d = d[order(d,decreasing=F)]
      y = -log10(d)
      x = -log10( ppoints(length(d) ))
      m <- 0.001 * length(x)
      e <- floor(0.0005 * length(x))
      return(list(x=quant.subsample(x, m, e), y=quant.subsample(y, m, e)))
    }
    
    fqq <- function(x, y, ...) {
      plot(0,
           col=FALSE,
           xlim=range(x),
           ylim=range(y),
           xlab=expression(Expected~~-log[10](italic(p))),
           ylab=expression(Observed~~-log[10](italic(p))),
           ...)
      abline(0,1,col=2)
      points(x,y, ...)
    }
    
    args <- commandArgs(trailingOnly = TRUE)
    pv.f = args[1]
    qq.f = args[2]
    nrows = as.numeric(args[3])
    message(Sys.time())
    message("READING")
    d <- read.table(pv.f, header=TRUE, sep=" ", nrows=nrows, colClasses=c("numeric"))
    message(Sys.time())
    message("LAMBDA")
    chisq <- qchisq(1-d$P_VAL,1)
    lambda = median(chisq)/qchisq(0.5,1)
    message(Sys.time())
    message("PLOTING")
    p <- get.points(d$P_VAL)
    png(file=qq.f)
    fqq(p$x, p$y, main=paste(pv.f, lambda, sep="\n"), cex.axis=1.5, cex.lab=1.5)
    dev.off()
    message(Sys.time())
    
    find_conf_interval=函数(行){
    i=第[1]行
    len=第[2]行
    如果(i<10000 | i%%100==0){
    返回值(c(-log10(qbeta(0.95,i,len-i+1)),-log10(qbeta(0.05,i,len-i+1)))
    }否则{#加快速度
    返回(c(不适用,不适用))
    }
    }
    
    置信区间我不认为你可以在同一台设备上并行绘图。如果打印时间过长,则可能会在这些打印中打印大量点(无法分辨)。考虑如何避免这一点。“罗兰海伊,谢谢你的评论。你说得对。我有GWAS数据(全基因组关联研究),其中12个,所以它们非常大,所有数据点(p值)都需要绘制在QQ(分位数-分位数)图中……这是无法避免的。12个网格对象加在一起的总大小约为650MB。我认为您需要绘制所有点。我有一个R脚本,执行1和2,但它是根据我的工作流程定制的。我很高兴与大家分享。这对我帮助很大。非常感谢。我很想了解如何将第1点和第2点自定义到我的工作流程中,因此,通过查看您的脚本,我将受益匪浅。我怎样才能联系到你?