Python 删除文件的蛇形规则_Python_Snakemake

Python 删除文件的蛇形规则

python

Python 删除文件的蛇形规则,python,snakemake,Python,Snakemake,我有一个大的蛇形文件，看起来像这样（简化了很多）（此设置允许我通过在all规则中链接文件名后缀来使用类似函数的规则。）起始状态： $ ls -tr1 Snakefile raw1.csv raw2.csv $ snakemake all ... 之后： $ ls -tr1 Snakefile raw1.csv raw2.csv raw2.a.csv raw2.a.b.csv raw2.a.b.c.csv raw2.a.b.c.d.csv raw1.a.csv raw1.a.b.csv r

我有一个大的蛇形文件，看起来像这样（简化了很多）

（此设置允许我通过在

all

规则中链接文件名后缀来使用类似函数的规则。）

起始状态：

$ ls -tr1
Snakefile
raw1.csv
raw2.csv

$ snakemake all
...

之后：

$ ls -tr1
Snakefile
raw1.csv
raw2.csv
raw2.a.csv
raw2.a.b.csv
raw2.a.b.c.csv
raw2.a.b.c.d.csv
raw1.a.csv
raw1.a.b.csv
raw1.a.b.c.csv
raw1.a.b.c.a.csv
raw1.a.b.c.a.d.csv
raw2.a.b.c.d.a.csv

现在，我想添加一个删除特定中间文件的规则（例如

raw1.a.csv

和

raw2.a.b.csv

），因为我不需要它们，它们占用了大量磁盘空间。由于通配符

{path}

，我无法使用

temp（）

标记输出

有什么建议吗？谢谢。

编辑：实际上，这个解决方案不起作用。。它会导致比赛状态

好吧，我知道了

rule a:
    input: '{path}.csv'
    output: '{path}.a.csv'
    shell: 'cp {input} {output}'
rule b:
    input: '{path}.csv'
    output: '{path}.b.csv'
    shell: 'cp {input} {output}'
rule c:
    input: '{path}.csv'
    output: '{path}.c.csv'
    shell: 'cp {input} {output}'
rule d:
    input: '{path}.csv'
    output: '{path}.d.csv'
    shell: 'cp {input} {output}'
rule remove:                          # <-- rule to delete a file
    input: '{path}'
    output: touch('{path}.removed')
    shell: 'rm {input}'
rule all:
    input: 'raw1.a.b.c.a.d.csv',
           'raw2.a.b.c.d.a.csv',
           'raw1.a.csv.removed',      # <-- specify which files to rm
           'raw2.a.b.c.csv.removed',  # <-- specify which files to rm

temp（）

在这种情况下有效

rule all:
    input: 'raw1.a.b.c.a.d.csv',
        'raw2.a.b.c.d.a.csv'

rule a:
    input: '{path}.csv'
    output: temp('{path}.a.csv')
    shell: 'cp {input} {output}'
rule b:
    input: '{path}.csv'
    output: '{path}.b.csv'
    shell: 'cp {input} {output}'
rule c:
    input: '{path}.csv'
    output: temp('{path}.c.csv')
    shell: 'cp {input} {output}'
rule d:
    input: '{path}.csv'
    output: '{path}.d.csv'
    shell: 'cp {input} {output}'

执行此操作将创建文件

raw1.a.b.c.a.d.csv、raw1.a.b.csv、raw2.a.b.csv、raw2.a.b.csv

并自动删除文件

raw1.a.csv、raw2.a.csv、raw1.a.b.c.csv、raw2.a.b.c.csv、raw1.a.b.c.c.csv、raw2.a.b.d.csv

。

$ snakemake --dag all | dot -Tpng > dag.png

rule all:
    input: 'raw1.a.b.c.a.d.csv',
        'raw2.a.b.c.d.a.csv'

rule a:
    input: '{path}.csv'
    output: temp('{path}.a.csv')
    shell: 'cp {input} {output}'
rule b:
    input: '{path}.csv'
    output: '{path}.b.csv'
    shell: 'cp {input} {output}'
rule c:
    input: '{path}.csv'
    output: temp('{path}.c.csv')
    shell: 'cp {input} {output}'
rule d:
    input: '{path}.csv'
    output: '{path}.d.csv'
    shell: 'cp {input} {output}'