Python 3.x 使用蛇形检查站+；聚合以从internet获取未知数量的ID_Python 3.x_Snakemake

Python 3.x 使用蛇形检查站+；聚合以从internet获取未知数量的ID

python-3.x

Python 3.x 使用蛇形检查站+；聚合以从internet获取未知数量的ID,python-3.x,snakemake,Python 3.x,Snakemake,下面是我正在尝试做的一个简化示例： rule all: input: "results.txt" rule find_data: output: "work_dir/data.txt" run: # pretend this retrives IDs with open(output[0], 'w') as fh: for i in map(str, range(5)): print

下面是我正在尝试做的一个简化示例：

rule all:
    input: "results.txt"

rule find_data:
    output: "work_dir/data.txt"
    run:
        # pretend this retrives IDs
        with open(output[0], 'w') as fh:
            for i in map(str, range(5)):
                print(i, file=fh)

checkpoint download_data:
    input: "work_dir/data.txt"
    output: directory("work_dir/{sample_id}")
    run:
        with open(input[0], 'r') as fh:
            for l in fh:
                l = l.rstrip()
                # pretend this downloads data
                shell("touch work_dir/{}".format(l))


def aggregate_signatures(wildcards):
    checkpoint_output = checkpoints.download_data.get(**wildcards).output[0]
    return expand("work_dir/{sample_id}", sample_id=checkpoint_output.sample_id)

rule make_database:
    input: aggregate_signatures
    output: "results.txt"
    shell:
         "cat {input} > {output}"

这导致了错误

InputFunctionException in line 40 of /Users/ian.fiddes/test_sourmash/sourmash/Snakefile:
WorkflowError: Missing wildcard values for sample_id
Wildcards:

我不确定上面的代码在功能上与snakemake手册的数据相关条件执行部分中的示例有何不同

这里的想法是从互联网上下载一些未知数量的ID，然后让每个作业都有一个ID的作业，然后下载一些数据，然后在聚合之前有中间步骤处理这些数据。

你就成功了

应该让snakemake重新评估DAG的规则不是

下载\u数据

，而是

查找\u数据

。因为在你找到数据后你要检查你有哪些数据。因此，将

find_data

设置为检查点，并在

aggregate_signatures

中更改相应的行就是您所要做的：

checkpoint_output = checkpoints.find_data.get(**wildcards).output[0]