Python 访问配置列表元素的前缀以从字典中获取值

Python 访问配置列表元素的前缀以从字典中获取值,python,bioinformatics,snakemake,vcf-variant-call-format,Python,Bioinformatics,Snakemake,Vcf Variant Call Format,因此,我正在尝试使用snakemake构建一个管道,在访问配置文件中字典“small_reference”中的值时遇到了一些问题。 根据示例的不同,我希望对路线使用不同的参考 配置文件: samples: ['C130165', 'C014044p', 'C130166', 'C130157', 'C014040p', 'C014054b-1', 'C051198-A', 'C014042p', 'C052007W-C', 'C130167', 'C051198-B', 'C130157A',

因此,我正在尝试使用snakemake构建一个管道,在访问配置文件中字典“small_reference”中的值时遇到了一些问题。 根据示例的不同,我希望对路线使用不同的参考

配置文件:

samples: ['C130165', 'C014044p', 'C130166', 'C130157', 'C014040p', 'C014054b-1', 'C051198-A', 'C014042p', 'C052007W-C', 'C130167', 'C051198-B', 'C130157A', 'C130165A', 'C014038p', 'C052004-B', 'C051198-C', 'C052004-C', 'C130167', 'C052003-B', 'C130165', 'C052003-A', 'C052004-A', 'C052002-C', 'C130157', 'C052005-C', 'C130157W', 'C130167A', 'C130157A', 'C130166A', 'C052002-A', 'C130157N', 'C052006-B', 'C014063pW', 'C130157W', 'C130157N', 'C014054b-2', 'C052002-B', 'C130167A', 'C052006-C', 'C130166A', 'C052007W-B', 'C052003-C', 'C130165A', 'C014064bW', 'C052005-B', 'C130166', 'C052006-A', 'C052005-A']


reference: "/mnt/storage/refs/human_1kg/human_g1k_v37.fasta"

index: "/mnt/storage/refs/human_1kg/human_g1k_v37.fasta.fai"

dbsnp: "/mnt/storage/refs/human_1kg/dbsnp_137.b37.vcf"

small_reference: {
    C01: "/mnt/storage/projects/hiv_data/refs/BRCA/BRCA12_PALB2.fasta",
    Z01: "/mnt/storage/projects/hiv_data/refs/BRCA/BRCA12.fasta",
    C02: "/mnt/storage/projects/hiv_data/refs/STICKLERS/STICKERS_ext.fasta",
    C03: "/mnt/storage/projects/hiv_data/refs/TS/TS.fasta",
    C04: "/mnt/storage/projects/hiv_data/refs/STICKLERS/STICKERS.fasta",
    C05: "/mnt/storage/projects/hiv_data/refs/PKD_GANAB/PKD.fasta",
    C07: "/mnt/storage/projects/hiv_data/refs/NEMO/NEMO.fasta",
    C08: "/mnt/storage/projects/hiv_data/refs/HNPCC/HNPCC.fasta",
    C09: "/mnt/storage/projects/hiv_data/refs/TAU/TAU.fasta",
    C10: "/mnt/storage/projects/hiv_data/refs/THYROID/THYROID.fasta",
    C12: "/mnt/storage/projects/hiv_data/refs/VWF/VWF.fasta",
    C13: "/mnt/storage/refs/human_1kg/human_g1k_v37.fasta",
    C17: "/mnt/storage/projects/hiv_data/refs/DICER_PALB2/DICER_PALB2.fasta",
    C18: "/mnt/storage/projects/hiv_data/refs/DICER_PALB2/DICER_PALB2.fasta",
}

baits: {
    C01: "/mnt/storage/projects/hiv_data/refs/BRCA/BRCA12_PALB2.bed",
    Z01: "/mnt/storage/projects/hiv_data/refs/BRCA/BRCA12_exons.bed",
    C02: "/mnt/storage/projects/hiv_data/refs/STICKLERS/STICKERS_ext.bed",
    C03: "/mnt/storage/projects/hiv_data/refs/TS/TS_exons.bed",
    C04: "/mnt/storage/projects/hiv_data/refs/STICKLERS/STICKERS.bed",
    C05: "/mnt/storage/projects/hiv_data/refs/PKD_GANAB/PKD.bed",
    C07: "/mnt/storage/projects/hiv_data/refs/NEMO/NEMO.bed",
    C08: "/mnt/storage/projects/hiv_data/refs/HNPCC/HNPCC.bed",
    C09: "/mnt/storage/projects/hiv_data/refs/TAU/TAU.bed",
    C10: "/mnt/storage/projects/hiv_data/refs/THYROID/THYROID_v2.bed",
    C12: "/mnt/storage/projects/hiv_data/refs/VWF/VWF.bed",
    C13: "/mnt/storage/refs/human_1kg/human_g1k_v37.bed",
    C17: "/mnt/storage/projects/hiv_data/refs/DICER_PALB2/DICER_PALB2.bed",
    C18: "/mnt/storage/projects/hiv_data/refs/DICER_PALB2/DICER_PALB2.bed",
}
根据示例的前3个字符,我想选择一个不同的参考。我编写了一个函数,当config[“samples”]只是一个字符串时,它就可以实现这一点。但现在我想考虑Run文件夹,所以我有一个样本列表。

工作职能:

def get_ref(wildcards):
    prefix = config["samples"][0:3]
    return config["small_reference"][prefix]
当我刚刚更改配置文件时(因此在运行整个管道时),我第一次在规则中得到这个错误
复制输出文件模式

测试规则:

rule test:
    input:
        fq = expand("{sample}.1.fq.gz", sample = config["samples"]),
        ref = get_ref
    shell:
        "echo {input.fq} {input.ref}"
现在,在运行测试规则时出现以下错误:

InputFunctionException in line 17 of /mnt/storage/home/kimy/projects/automate_CP/scripts/Snakefile:
TypeError: unhashable type: 'list'
Wildcards:

示例:C014038p-->C01-->/mnt/storage/projects/hiv_data/refs/BRCA/BRCA12_PALB2.fasta


如何根据管道分析的样本前缀获取正确的“小参考”?

结果表明我的snakemake all错误,expand()用于创建通配符-->仅在规则all中指定,而不是在每个规则中指定。
已修改的有效功能:

def get_ref(wildcards):
    prefix = wildcards.sample[0:3]
    return config["small_reference"][prefix]

前缀=[a[0:3]表示配置中的[samples]]我认为这对我没有帮助。我编辑了我的问题以便更清楚,我的最终目标是根据管道分析的样本的前缀获得正确的“小参考值”。
expand()
用于创建列表,仅此而已<代码>通配符
可视为“变量”。因为snakemake是基于文件(输入和输出)的,所以这些“变量”必须在某个地方定义。这通常是规则all(snakefile中定义的第一条规则),因为snakemake将首先查看它必须创建的目标文件,然后通过规则查看如何创建这些目标。