Snakemake &引用;输入由另一个作业更新的文件“;但事实并非如此

Snakemake &引用;输入由另一个作业更新的文件“;但事实并非如此,snakemake,Snakemake,我有一个管道的问题,我不知道这是一个坏的使用对我来说,还是一个错误。我正在使用snakemake 5.26.1。直到几天前,我还没有遇到任何问题,使用了相同版本的snakemake。我不明白是什么改变了 管道部分刚好在检查点规则和聚合步骤之后,这将生成一个输出文件,例如foo.fasta 在生成foo.fasta之后,我使用该文件作为输入文件,并使用另一个作业更新的--reason输入文件再次运行这些规则,但事实并非如此,它们的输出比foo.fasta更新 此外,使用--summary选项,文件

我有一个管道的问题,我不知道这是一个坏的使用对我来说,还是一个错误。我正在使用snakemake 5.26.1。直到几天前,我还没有遇到任何问题,使用了相同版本的snakemake。我不明白是什么改变了

管道部分刚好在检查点规则和聚合步骤之后,这将生成一个输出文件,例如
foo.fasta

在生成
foo.fasta
之后,我使用该文件作为输入文件,并使用另一个作业更新的
--reason
输入文件再次运行这些规则,但事实并非如此,它们的输出比
foo.fasta
更新

此外,使用
--summary
选项,文件foo.fasta被标记为无更新,并且文件具有正确的时间戳顺序,以便不必再次执行规则

我找不到再次运行以下规则的原因。检查点是否可能导致snakemake think
foo出现问题。fasta
在未更新时被更新

这是一个试运行输出: 规则
agouti\u scaffolding
产生
tros\u v5.pseudohap.fasta.gz
(my
foo.fasta

似乎问题出在
根据作业kat_comp更新。
甚至kat_comp输入上的
古代
标志也不会改变任何东西

编辑

这是管道的一部分。 正如@TroyComi所建议的,最新的开发版本(5.26.1+26.gc2e2b501.dirty)解决了古老指令的问题。删除旧指令时,仍会执行规则

rule all:
    "results/kat/tros_v5/tros_v5_spectra.pdf"


checkpoint split_fa_augustus:
    input:
        "results/fasta/{sample}_v4.pseudohap.fasta.gz"
    output:
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        directory("results/agouti/{sample}/split")
    params:
        split_size = 50000000
    conda:
        "../envs/augustus.yaml"
    shell:
        """
        zcat {input} > {output[0]}
        mkdir {output[1]}
        splitMfasta.pl {output[0]} \
        --outputpath={output[1]} --minsize={params.split_size}
        """

rule augustus:
    input:
        "results/agouti/{sample}/split/{sample}_v4.pseudohap.split.{i}.fa"
    output:
        "results/agouti/{sample}/split/pred_{i}.gff3"
    conda:
        "../envs/augustus.yaml"
    shell:
        """
        augustus --gff3=on --species=caenorhabditis {input} > {output}
        """

def aggregate_input_gff3(wildcards):
    checkpoint_output = checkpoints.split_fa_augustus.get(**wildcards).output[1]
    return expand("results/agouti/{sample}/split/pred_{i}.gff3",
           sample=wildcards.sample,
           i=glob_wildcards(os.path.join(checkpoint_output, f"{wildcards.sample}_v4.pseudohap.split." + "{i}.fa")).i)

rule aggregate_gff3:
    input:
        aggregate_input_gff3
    output:
        "results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
    conda:
        "../envs/augustus.yaml"
    shell:
        "cat {input} | join_aug_pred.pl > {output}"

#===============================
# Preprocess RNAseq data

# The RNAseq reads need to be in the folder resources/RNAseq_raw/{sample}
# Files must be named {run}_R1.fastq.gz and {run}_R2.fastq.gz for globbing to work
# globbing is done in the rule merge_RNA_bams

rule rna_rcorrector:
    input:
        expand("resources/RNAseq_raw/{{sample}}/{{run}}_{R}.fastq.gz",
            R=['R1', 'R2'])
    output:
        temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
            R=['R1', 'R2']))
    params:
        outdir = lambda w, output: os.path.dirname(output[0])
    log:
        "logs/rcorrector_{sample}_{run}.log" 
    threads:
        config['agouti']['threads']
    conda:
        "../envs/rna_seq.yaml"
    shell:
        """
        run_rcorrector.pl -1 {input[0]} -2 {input[1]} \
        -t {threads} \
        -od {params.outdir} \
        > {log} 2>&1
        """

rule rna_trimgalore:
    input:
        expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
            R=['R1', 'R2'])
    output:
        temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
            i=['1', '2']))
    params:
        outdir = lambda w, output: os.path.dirname(output[0]),
        basename = lambda w: f'{w.run}_trimgal'
    log:
        "logs/trimgalore_{sample}_{run}.log" 
    threads:
        config['agouti']['threads']
    conda:
        "../envs/rna_seq.yaml"
    shell: 
        """
        trim_galore --cores {threads} \
        --phred33 \
        --quality 20 \
        --stringency 1 \
        -e 0.1 \
        --length 70 \
        --output_dir {params.outdir} \
        --basename {params.basename} \
        --dont_gzip \
        --paired \
        {input} \
        > {log} 2>&1
        """

#===============================
# Map the RNAseq reads
rule index_ref:
    input: 
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa"
    output:
        multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
            ".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
    conda:
        "../envs/mapping.yaml"
    shell:
        "bwa-mem2 index {input}"

rule map_RNAseq:
    input: 
        expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
            i=['1', '2']),
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
            ".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
    output:
        "results/agouti/{sample}/mapping/{run}.bam"
    log:
        "logs/bwa_rna_{sample}_{run}.log"
    conda:
        "../envs/mapping.yaml"
    threads:
        config['agouti']['threads']
    shell:
        """
        bwa-mem2 mem -t {threads} {input[2]} {input[0]} {input[1]} 2> {log} \
        | samtools view -b -@ {threads} -o {output}
        """

def get_sample_rna_runs(w):
    list_R1_files = glob.glob(f"resources/RNAseq_raw/{w.sample}/*_R1.fastq.gz")
    list_runs = [re.sub('_R1\.fastq\.gz$', '', os.path.basename(f)) for f in list_R1_files]
    return [f'results/agouti/{w.sample}/mapping/{run}.bam' for run in list_runs]

rule merge_RNA_bams:
    input:
        get_sample_rna_runs
    output:
        "results/agouti/{sample}/RNAseq_mapped_merged.bam"
    params:
        tmp_merge = lambda w: f'results/agouti/{w.sample}/tmp_merge.bam'
    conda:
        "../envs/mapping.yaml"
    threads:
        config['agouti']['threads']
    shell:
        """
        samtools merge -@ {threads} {params.tmp_merge} {input}               
        samtools sort -@ {threads} -n -o {output} {params.tmp_merge}
        rm {params.tmp_merge}
        """

#===============================
# Run agouti on all that
rule agouti_scaffolding:
    input: 
        fa = "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        bam = "results/agouti/{sample}/RNAseq_mapped_merged.bam",
        gff = "results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
    output: 
        protected("results/fasta/{sample}_v5.pseudohap.fasta.gz")
    params:
        outdir = lambda w: f'results/agouti/{w.sample}/agouti_out',
        minMQ = 20,
        maxFracMM = 0.05
    log: 
        "logs/agouti_{sample}.log"
    conda: 
        "../envs/agouti.yaml"
    shell:
        """
        python /opt/agouti/agouti.py scaffold \
        -assembly {input.fa} \
        -bam {input.bam} \
        -gff {input.gff} \
        -outdir {params.outdir} \
        -minMQ {params.minMQ} -maxFracMM {params.maxFracMM} \
        > {log} 2>&1

        gzip -c {params.outdir}/agouti.agouti.fasta > {output}
        """

#===============================================================
# Now do something on output {sample}_{version}.pseudohap.fasta.gz

rule kat_comp:
    input:
        expand("results/preprocessing/{{sample}}/{{sample}}_dedup_proc_fastp_{R}_001.fastq.gz",
            R=["R1", "R2"]),
        ancient("results/fasta/{sample}_{version}.pseudohap.fasta.gz")
    output:
        "results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
    params:
        outprefix = lambda w: f'results/kat/{w.sample}_{w.version}/{w.sample}_{w.version}_comp'
    log:
        "logs/kat_comp.{sample}_{version}.log"
    conda:
        "../envs/kat.yaml"
    threads:
        16
    shell:
        """
        kat comp -t {threads} \
        -o {params.outprefix} \
        '{input[0]} {input[1]}' \
        {input[2]} \
        > {log} 2>&1
        """

rule kat_plot_spectra:
    input:
        "results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
    output:
        "results/kat/{sample}_{version}/{sample}_{version}_spectra.pdf"
    params:
        title = lambda w: f'{w.sample}_{w.version}'
    log:
        "logs/kat_plot.{sample}_{version}.log"
    conda:
        "../envs/kat.yaml"
    shell:
        """
        kat plot spectra-cn \
        -o {output} \
        -t {params.title} \
        {input} \
        > {log} 2>&1
        """

你可能正确地认为这是检查点。你能更新你的问题以包括你的蛇档案的相关部分吗?如果你想快速解决问题,试试看。这似乎也是最近才开始的。检查最新版本的行为,如果古文不起作用,请考虑打开一个问题。谢谢指点我正确的方向,古老的指令现在正在最新的开发版本中工作。您是否知道检查点行为是故意的,还是一个蛇形虫。我不记得以前有过这样的问题。老实说,我也可以看到。你可以说它应该更新作业,因为DAG必须重新评估,如果你不想这种行为,你可以将输入标记为过时。您还可以说,不更新更符合make范式。如果你能举一个小例子来说明你的行为和你所期望的,那么这将是一个更有效的问题。使用文档中的小例子(第二个是集群),这个问题就不会出现。因此,我认为我的计划中存在一个问题。
rule all:
    "results/kat/tros_v5/tros_v5_spectra.pdf"


checkpoint split_fa_augustus:
    input:
        "results/fasta/{sample}_v4.pseudohap.fasta.gz"
    output:
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        directory("results/agouti/{sample}/split")
    params:
        split_size = 50000000
    conda:
        "../envs/augustus.yaml"
    shell:
        """
        zcat {input} > {output[0]}
        mkdir {output[1]}
        splitMfasta.pl {output[0]} \
        --outputpath={output[1]} --minsize={params.split_size}
        """

rule augustus:
    input:
        "results/agouti/{sample}/split/{sample}_v4.pseudohap.split.{i}.fa"
    output:
        "results/agouti/{sample}/split/pred_{i}.gff3"
    conda:
        "../envs/augustus.yaml"
    shell:
        """
        augustus --gff3=on --species=caenorhabditis {input} > {output}
        """

def aggregate_input_gff3(wildcards):
    checkpoint_output = checkpoints.split_fa_augustus.get(**wildcards).output[1]
    return expand("results/agouti/{sample}/split/pred_{i}.gff3",
           sample=wildcards.sample,
           i=glob_wildcards(os.path.join(checkpoint_output, f"{wildcards.sample}_v4.pseudohap.split." + "{i}.fa")).i)

rule aggregate_gff3:
    input:
        aggregate_input_gff3
    output:
        "results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
    conda:
        "../envs/augustus.yaml"
    shell:
        "cat {input} | join_aug_pred.pl > {output}"

#===============================
# Preprocess RNAseq data

# The RNAseq reads need to be in the folder resources/RNAseq_raw/{sample}
# Files must be named {run}_R1.fastq.gz and {run}_R2.fastq.gz for globbing to work
# globbing is done in the rule merge_RNA_bams

rule rna_rcorrector:
    input:
        expand("resources/RNAseq_raw/{{sample}}/{{run}}_{R}.fastq.gz",
            R=['R1', 'R2'])
    output:
        temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
            R=['R1', 'R2']))
    params:
        outdir = lambda w, output: os.path.dirname(output[0])
    log:
        "logs/rcorrector_{sample}_{run}.log" 
    threads:
        config['agouti']['threads']
    conda:
        "../envs/rna_seq.yaml"
    shell:
        """
        run_rcorrector.pl -1 {input[0]} -2 {input[1]} \
        -t {threads} \
        -od {params.outdir} \
        > {log} 2>&1
        """

rule rna_trimgalore:
    input:
        expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_{R}.cor.fq.gz",
            R=['R1', 'R2'])
    output:
        temp(expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
            i=['1', '2']))
    params:
        outdir = lambda w, output: os.path.dirname(output[0]),
        basename = lambda w: f'{w.run}_trimgal'
    log:
        "logs/trimgalore_{sample}_{run}.log" 
    threads:
        config['agouti']['threads']
    conda:
        "../envs/rna_seq.yaml"
    shell: 
        """
        trim_galore --cores {threads} \
        --phred33 \
        --quality 20 \
        --stringency 1 \
        -e 0.1 \
        --length 70 \
        --output_dir {params.outdir} \
        --basename {params.basename} \
        --dont_gzip \
        --paired \
        {input} \
        > {log} 2>&1
        """

#===============================
# Map the RNAseq reads
rule index_ref:
    input: 
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa"
    output:
        multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
            ".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
    conda:
        "../envs/mapping.yaml"
    shell:
        "bwa-mem2 index {input}"

rule map_RNAseq:
    input: 
        expand("results/agouti/{{sample}}/RNA_preproc/{{run}}_trimgal_val_{i}.fq",
            i=['1', '2']),
        "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        multiext("results/agouti/{sample}/{sample}_v4.pseudohap.fa",
            ".0123", ".amb", ".ann", ".bwt.2bit.64", ".bwt.8bit.32", ".pac")
    output:
        "results/agouti/{sample}/mapping/{run}.bam"
    log:
        "logs/bwa_rna_{sample}_{run}.log"
    conda:
        "../envs/mapping.yaml"
    threads:
        config['agouti']['threads']
    shell:
        """
        bwa-mem2 mem -t {threads} {input[2]} {input[0]} {input[1]} 2> {log} \
        | samtools view -b -@ {threads} -o {output}
        """

def get_sample_rna_runs(w):
    list_R1_files = glob.glob(f"resources/RNAseq_raw/{w.sample}/*_R1.fastq.gz")
    list_runs = [re.sub('_R1\.fastq\.gz$', '', os.path.basename(f)) for f in list_R1_files]
    return [f'results/agouti/{w.sample}/mapping/{run}.bam' for run in list_runs]

rule merge_RNA_bams:
    input:
        get_sample_rna_runs
    output:
        "results/agouti/{sample}/RNAseq_mapped_merged.bam"
    params:
        tmp_merge = lambda w: f'results/agouti/{w.sample}/tmp_merge.bam'
    conda:
        "../envs/mapping.yaml"
    threads:
        config['agouti']['threads']
    shell:
        """
        samtools merge -@ {threads} {params.tmp_merge} {input}               
        samtools sort -@ {threads} -n -o {output} {params.tmp_merge}
        rm {params.tmp_merge}
        """

#===============================
# Run agouti on all that
rule agouti_scaffolding:
    input: 
        fa = "results/agouti/{sample}/{sample}_v4.pseudohap.fa",
        bam = "results/agouti/{sample}/RNAseq_mapped_merged.bam",
        gff = "results/agouti/{sample}/{sample}_v4.pseudohap.gff3"
    output: 
        protected("results/fasta/{sample}_v5.pseudohap.fasta.gz")
    params:
        outdir = lambda w: f'results/agouti/{w.sample}/agouti_out',
        minMQ = 20,
        maxFracMM = 0.05
    log: 
        "logs/agouti_{sample}.log"
    conda: 
        "../envs/agouti.yaml"
    shell:
        """
        python /opt/agouti/agouti.py scaffold \
        -assembly {input.fa} \
        -bam {input.bam} \
        -gff {input.gff} \
        -outdir {params.outdir} \
        -minMQ {params.minMQ} -maxFracMM {params.maxFracMM} \
        > {log} 2>&1

        gzip -c {params.outdir}/agouti.agouti.fasta > {output}
        """

#===============================================================
# Now do something on output {sample}_{version}.pseudohap.fasta.gz

rule kat_comp:
    input:
        expand("results/preprocessing/{{sample}}/{{sample}}_dedup_proc_fastp_{R}_001.fastq.gz",
            R=["R1", "R2"]),
        ancient("results/fasta/{sample}_{version}.pseudohap.fasta.gz")
    output:
        "results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
    params:
        outprefix = lambda w: f'results/kat/{w.sample}_{w.version}/{w.sample}_{w.version}_comp'
    log:
        "logs/kat_comp.{sample}_{version}.log"
    conda:
        "../envs/kat.yaml"
    threads:
        16
    shell:
        """
        kat comp -t {threads} \
        -o {params.outprefix} \
        '{input[0]} {input[1]}' \
        {input[2]} \
        > {log} 2>&1
        """

rule kat_plot_spectra:
    input:
        "results/kat/{sample}_{version}/{sample}_{version}_comp-main.mx"
    output:
        "results/kat/{sample}_{version}/{sample}_{version}_spectra.pdf"
    params:
        title = lambda w: f'{w.sample}_{w.version}'
    log:
        "logs/kat_plot.{sample}_{version}.log"
    conda:
        "../envs/kat.yaml"
    shell:
        """
        kat plot spectra-cn \
        -o {output} \
        -t {params.title} \
        {input} \
        > {log} 2>&1
        """