Input 多个输出到单个列表输入-在Nextflow中合并BAM文件

Input 多个输出到单个列表输入-在Nextflow中合并BAM文件,input,merge,samtools,nextflow,bam,Input,Merge,Samtools,Nextflow,Bam,我正在尝试将通过一次执行多个对齐而生成的x个bam文件(在y个fastq文件的批次上)合并到Nextflow中的单个bam文件中 到目前为止,在对生成的bam文件执行对齐和排序/索引时,我有以下几点: //Run minimap2 on concatenated fastqs process miniMap2Bam { publishDir "$params.bamDir" errorStrategy 'retry' cache

我正在尝试将通过一次执行多个对齐而生成的x个bam文件(在y个fastq文件的批次上)合并到Nextflow中的单个bam文件中

到目前为止,在对生成的bam文件执行对齐和排序/索引时,我有以下几点:

//Run minimap2 on concatenated fastqs
process miniMap2Bam {
        publishDir "$params.bamDir"
        errorStrategy 'retry'
        cache 'deep'
        maxRetries 3
        maxForks 10
        memory { 16.GB * task.attempt }

        input:
        val dirString from dirStr
        val runString from stringRun
        each file(batchFastq) from fastqBatch.flatMap()

        output:
        val runString into stringRun1
        file("${batchFastq}.bam") into bamFiles
        val dirString into dirStrSam

        script:
        """
        minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
        samtools index ${batchFastq}.bam
        """
}
其中,
${batchFastq}.bam
是一个bam文件,其中包含一批y数量的fastq文件

但是,当尝试在另一个进程(samToolsMerge)中对这些bam文件执行
samtools merge
时,此管道会在每次运行对齐时运行(在本例中为4),而不是对收集的所有bam文件运行一次:

//Run samtools merge
process samToolsMerge {
        echo true
        publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
        cache 'deep'
        errorStrategy 'retry'
        maxRetries 3
        maxForks 10
        memory { 14.GB * task.attempt }

        input:
        val runString from stringRun1
        file bamFile from bamFiles.collect()
        val dirString from dirStrSam

        output:
        file("**")

        script:
        """
        samtools merge ${runString}.bam ${bamFile} 
        """
}

其输出为:

executor >  lsf (9)
[49/182ec0] process > catFastqs (1)     [100%] 1 of 1 ✔
[-        ] process > nanoPlotSummary   -
[0e/609a7a] process > miniMap2Bam (1)   [100%] 4 of 4 ✔
[42/72469d] process > samToolsMerge (2) [100%] 4 of 4 ✔




Completed at: 04-Mar-2021 14:54:21
Duration    : 5m 41s
CPU hours   : 0.2
Succeeded   : 9
如何从
miniMap2Bam
中获取生成的bam文件,并通过
samToolsMerge
一次运行它们,而不是多次运行进程

提前谢谢

编辑: 多亏了下面评论中的Pallie,问题是将先前进程中的runString和dirString值输入miniMap2Bam,然后是samToolsMerge,导致每次传递值时该进程都会重复自身

解决方案非常简单,只需从miniMap2Bam中删除VAL(如下所示):


最简单的修复方法可能是停止通过通道传递静态dirstring和runstring:

// Instead of a hardcoded path use a parameter you passed via CLI like you did with bamDir
dirString = file("/path/to/fastqs/")
runString = file("/path/to/fastqs/").getParent()
fastqBatch = Channel.from("/path/to/fastqs/")

//Run minimap2 on concatenated fastqs
process miniMap2Bam {
        publishDir "$params.bamDir"
        errorStrategy 'retry'
        cache 'deep'
        maxRetries 3
        maxForks 10
        memory { 16.GB * task.attempt }

        input:
        each file(batchFastq) from fastqBatch.flatMap()

        output:
        file("${batchFastq}.bam") into bamFiles

        script:
        """
        minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
        samtools index ${batchFastq}.bam
        """
}

//Run samtools merge
process samToolsMerge {
        echo true
        publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
        cache 'deep'
        errorStrategy 'retry'
        maxRetries 3
        maxForks 10
        memory { 14.GB * task.attempt }

        input:
        file bamFile from bamFiles.collect()

        output:
        file("**")

        script:
        """
        samtools merge ${runString}.bam ${bamFile} 
        """


什么是runstring和dirstring,你想用它们实现什么?感觉您得到的行为来自将val输出到值通道。我认为您试图强制某个流,但没有使用nextflow中提供的正确工具。感谢您的回复。我相信你是对的,这些价值观可能是导致这种行为发生的原因。runstring是存放fastq文件的目录的父目录,dirstring是存放fastq文件的目录的父目录。这些字符串是在最初读取成批的fastq文件时创建的,以便成功地将文件发布到正确的目录-我将尝试并想出一个解决方法。非常感谢@Pallie!现在,只需在流程周围重新路由值,就可以解决这个问题,这样它就不再向合并流程提供信息了——我将编辑这个问题以提供解决方案,尽管这是非常直接的。我很想奖励你,但我不确定我是否能提供带有答案投票/选择的评论?嗨,agan Pallie,我会选择这个作为我问题的答案,因为问题现在已经解决,它准确地描述了我需要在我的管道中实现的内容。谢谢
// Instead of a hardcoded path use a parameter you passed via CLI like you did with bamDir
dirString = file("/path/to/fastqs/")
runString = file("/path/to/fastqs/").getParent()
fastqBatch = Channel.from("/path/to/fastqs/")

//Run minimap2 on concatenated fastqs
process miniMap2Bam {
        publishDir "$params.bamDir"
        errorStrategy 'retry'
        cache 'deep'
        maxRetries 3
        maxForks 10
        memory { 16.GB * task.attempt }

        input:
        each file(batchFastq) from fastqBatch.flatMap()

        output:
        file("${batchFastq}.bam") into bamFiles

        script:
        """
        minimap2 --secondary=no --MD -2 -t 10 -a $params.genome ${batchFastq} | samtools sort -o ${batchFastq}.bam
        samtools index ${batchFastq}.bam
        """
}

//Run samtools merge
process samToolsMerge {
        echo true
        publishDir "$dirString/aligned_minimap/", mode: 'copy', overwrite: 'false'
        cache 'deep'
        errorStrategy 'retry'
        maxRetries 3
        maxForks 10
        memory { 14.GB * task.attempt }

        input:
        file bamFile from bamFiles.collect()

        output:
        file("**")

        script:
        """
        samtools merge ${runString}.bam ${bamFile} 
        """