Python 使用检查点时将目录作为通配符_Python_Snakemake

Python 使用检查点时将目录作为通配符

python

Python 使用检查点时将目录作为通配符,python,snakemake,Python,Snakemake,我尝试使用子目录作为通配符，但是snakemake将通配符扩展到子目录中。我试图提出一个最小的例子，但这并不容易。如果例子不那么清楚，我会道歉。不过，它应该是开箱即用的管道说明规则第一步：此规则基本上为通配符创建两个文件夹运行 runs = ['run1', 'run2'] rule firststep: output: '{run}/firststep_done.txt' shell: 'touch {output} ;' checkpo

我尝试使用子目录作为通配符，但是

snakemake

将通配符扩展到子目录中。我试图提出一个最小的例子，但这并不容易。如果例子不那么清楚，我会道歉。不过，它应该是开箱即用的

管道说明

规则第一步

：此规则基本上为通配符创建两个文件夹

运行

runs = ['run1', 'run2']

rule firststep:
    output:
        '{run}/firststep_done.txt'
    shell:
        'touch {output} ;'

checkpointsecondstep

：此规则将输出任意数量的子目录，这些子目录稍后将用作

通配符（projectA
&projectB
）。在子目录中，会生成任意数量的文件
checkpoint secondstep:
    input:
        '{run}/firststep_done.txt',
    output:
        DIR = directory('{run}/secondstep')
    shell:
        'mkdir -p {output.DIR} ;'
        'mkdir -p {wildcards.run}/secondstep/projectA ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.2 ;'
        'mkdir -p {wildcards.run}/secondstep/projectB ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.2 ;'

rule intermediate
：此规则使用新的通配符projects
在另一个目录中创建文件，其中子目录是projects
通配符
rule intermediate:
    input:
        directory('{run}/secondstep/{project}')
    output:
        '{run}/report/{project}/arbitrary.all'
    shell:
        'echo "foo" > {output}'

作为下一步，我将为聚合规则创建一个输入函数：
def resolve_project(wildcards):
    checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
    return expand('{run}/report/{project}/arbitrary.all',
                  run=wildcards.run,
                  project=glob_wildcards(os.path.join(checkpoint_output,
                                                 "{project}")).project)

然后，最后一条规则aggregate
使用函数创建的输入来完成管道：
rule aggregate:
    input:
        resolve_project
    output:
        '{run}/report/{run}_done'
    shell:
        'cat {input} > {output}'

runs = ['run1', 'run2']

rule all:
    input:
        expand('{run}/report/{run}_done', run = runs)

rule firststep:
    output:
        '{run}/firststep_done.txt'
    shell:
        'touch {output} ;'


checkpoint secondstep:
    input:
        '{run}/firststep_done.txt',
    output:
        DIR = directory('{run}/secondstep')
    shell:
        'mkdir -p {output.DIR} ;'
        'mkdir -p {wildcards.run}/secondstep/projectA ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.2 ;'
        'mkdir -p {wildcards.run}/secondstep/projectB ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.2 ;'

rule intermediate:
    input:
        directory('{run}/secondstep/{project}')
    output:
        '{run}/report/{project}/arbitrary.all'
    shell:
        'echo "blabla" > {output}'


def resolve_project(wildcards):
    checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
    return expand('{run}/report/{project}/arbitrary.all',
                  run=wildcards.run,
                  project=glob_wildcards(os.path.join(checkpoint_output,
                                                 "{project}")).project)

rule aggregate:
    input:
        resolve_project
    output:
        '{run}/report/{run}_done'
    shell:
        'cat {input} > {output}'


我在下面发布了完整的复制粘贴管道
我看到两个问题：
中间规则的通配符为，例如：

通配符：run=run1，project=projectB/file\u任意。2

但我希望通配符{project}
仅为projectA
或projectB
。我怎样才能做到这一点
由于snakemake在文件夹第二步
中创建了一个.snakemake\u时间戳
，因此我还有一个名为.snakemake\u时间戳
的通配符。如何让snakemake仅从目录中推断通配符
感谢您的帮助
完整管道：
rule aggregate:
    input:
        resolve_project
    output:
        '{run}/report/{run}_done'
    shell:
        'cat {input} > {output}'

runs = ['run1', 'run2']

rule all:
    input:
        expand('{run}/report/{run}_done', run = runs)

rule firststep:
    output:
        '{run}/firststep_done.txt'
    shell:
        'touch {output} ;'


checkpoint secondstep:
    input:
        '{run}/firststep_done.txt',
    output:
        DIR = directory('{run}/secondstep')
    shell:
        'mkdir -p {output.DIR} ;'
        'mkdir -p {wildcards.run}/secondstep/projectA ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectA/file_arbitrary.2 ;'
        'mkdir -p {wildcards.run}/secondstep/projectB ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.1 ;'
        'touch {wildcards.run}/secondstep/projectB/file_arbitrary.2 ;'

rule intermediate:
    input:
        directory('{run}/secondstep/{project}')
    output:
        '{run}/report/{project}/arbitrary.all'
    shell:
        'echo "blabla" > {output}'


def resolve_project(wildcards):
    checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
    return expand('{run}/report/{project}/arbitrary.all',
                  run=wildcards.run,
                  project=glob_wildcards(os.path.join(checkpoint_output,
                                                 "{project}")).project)

rule aggregate:
    input:
        resolve_project
    output:
        '{run}/report/{run}_done'
    shell:
        'cat {input} > {output}'


编辑
正如下面的答案所指出的，这是一个通配符约束的问题。但是，执行全局通配符约束不起作用。由于有一个输入函数在起作用，因此必须在glob_通配符
语句中定义约束：
def resolve_project(wildcards):
    checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
    return expand('{run}/report/{project}/arbitrary.all',
                  run=wildcards.run,
                  project=glob_wildcards(os.path.join(checkpoint_output,
                                                 "{project, [^/|^.]+}")).project)

您需要的是通配符约束

：
这允许您定义一个正则表达式，将通配符限制为使用正则表达式定义的内容。例如：
通配符约束：
project=“[^/]+”

定义约束有几种方法：全局、规则或内联。下面是一个内联约束的示例：输出：'{run}/report/{project，[^/]+}/arbitral.all'
谢谢。我假设它必须使用通配符约束来完成。但是，您的解决方案不起作用（既不是全局定义的约束，也不是内联约束）。你能发布一个有效的例子吗？这不是“我的解决方案”，而是你的解决方案不起作用。我刚刚回答了您的问题，正如您所问的：如何强制{project}
仅成为projectA或projectB。每当你提供一个最小的示例时，我都会提供一个最小的工作示例（如果你的代码中唯一的问题是通配符约束）。最小的示例应该是开箱即用的，没有错误（至少对我来说是这样的-请参阅本文末尾的代码）。这就是我要求提供一个工作示例的原因。我现在设法让它工作（请参见编辑）。问题是全局通配符约束不起作用，很可能是因为使用了输入函数。如果在输入函数中指定约束，它将按预期工作。谢谢你的帮助。