Merge 使用snakemake合并多个vcf文件
我正在尝试使用snakemake通过染色体合并几个vcf文件。我的文件是这样的,正如你所看到的,有各种坐标。合并所有chr1A和chr1B的最佳方式是什么Merge 使用snakemake合并多个vcf文件,merge,snakemake,vcf-vcard,Merge,Snakemake,Vcf Vcard,我正在尝试使用snakemake通过染色体合并几个vcf文件。我的文件是这样的,正如你所看到的,有各种坐标。合并所有chr1A和chr1B的最佳方式是什么 chr1A:0-2096.filtered.vcf chr1A:2096-7896.filtered.vcf chr1B:0-3456.filtered.vcf chr1B:3456-8796.filtered.vcf 我的伪代码: chromosomes=["chr1A","chr1B"] rule all: input:
chr1A:0-2096.filtered.vcf
chr1A:2096-7896.filtered.vcf
chr1B:0-3456.filtered.vcf
chr1B:3456-8796.filtered.vcf
我的伪代码:
chromosomes=["chr1A","chr1B"]
rule all:
input:
expand("{sample}.vcf", sample=chromosomes)
rule merge:
input:
I1="path/to/file/{sample}.xxx.filtered.vcf",
I2="path/to/file/{sample}.xxx.filtered.vcf",
output:
outf ="{sample}.vcf"
shell:
"""
java -jar picard.jar GatherVcfs I={input.I1} I={input.I2} O={output.outf}
"""
编辑:
我可以复制你的错误。约束通配符时,它会起作用:
d = {"chr1A": ["chr1A:0-2096.flanking.view.filtered.vcf", "chr1A:2096-7896.flanking.view.filtered.vcf"],
"chr1B": ["chr1B:0-3456.flanking.view.filtered.vcf", "chr1B:3456-8796.flanking.view.filtered.vcf"]}
chromosomes = list(d)
rule all:
input:
expand("{sample}.vcf", sample=chromosomes)
# these tell Snakemake exactly what values the wildcards may take
# we use "|" to create the regex chr1A|chr1B
wildcard_constraints:
chromosome = "|".join(chromosomes)
rule merge:
input:
# a lambda is an unnamed function
# the first argument is the wildcards
# we merely use it to look up the appropriate files in the dict d
lambda w: d[w.chromosome]
output:
outf = "{chromosome}.vcf"
params:
# here we create the string
# "I=chr1A:0-2096.flanking.view.filtered.vcf I=chr1A:2096-7896.flanking.view.filtered.vcf"
# for use in our command
lambda w: "I=" + " I=".join(d[w.chromosome])
shell:
"java -jar /home/Documents/Tools/picard.jar GatherVcfs {params[0]} O={output.outf}"
它也应该在没有限制的情况下发挥作用;这似乎是Snakemake中的一个bug。hi,这给了我以下错误:KeyError:“chr1A:0-2096.filtered”实际上,我使用的正是您的代码:)我刚刚添加了chr1A,因为我的文件名为chr1A:0-2096.filtered.vcf(但即使我删除了chr1A,错误仍然存在),并且我刚刚添加了所有其他染色体…请参见上面的编辑,它在第35行中显示:InputFunctionException,其中有输入:f。我不知道,因为我不理解80%的代码,我感到困惑…很抱歉…如果我缩进打印(w),它会给出以下内容:KeyError:'2096-7896.filtered'
pip安装snakemake
或conda安装-c bioconda snakemake
d = {"chr1A": ["chr1A:0-2096.flanking.view.filtered.vcf", "chr1A:2096-7896.flanking.view.filtered.vcf"],
"chr1B": ["chr1B:0-3456.flanking.view.filtered.vcf", "chr1B:3456-8796.flanking.view.filtered.vcf"]}
chromosomes = list(d)
rule all:
input:
expand("{sample}.vcf", sample=chromosomes)
# these tell Snakemake exactly what values the wildcards may take
# we use "|" to create the regex chr1A|chr1B
wildcard_constraints:
chromosome = "|".join(chromosomes)
rule merge:
input:
# a lambda is an unnamed function
# the first argument is the wildcards
# we merely use it to look up the appropriate files in the dict d
lambda w: d[w.chromosome]
output:
outf = "{chromosome}.vcf"
params:
# here we create the string
# "I=chr1A:0-2096.flanking.view.filtered.vcf I=chr1A:2096-7896.flanking.view.filtered.vcf"
# for use in our command
lambda w: "I=" + " I=".join(d[w.chromosome])
shell:
"java -jar /home/Documents/Tools/picard.jar GatherVcfs {params[0]} O={output.outf}"