Python 删除文件的蛇形规则
我有一个大的蛇形文件,看起来像这样(简化了很多) (此设置允许我通过在Python 删除文件的蛇形规则,python,snakemake,Python,Snakemake,我有一个大的蛇形文件,看起来像这样(简化了很多) (此设置允许我通过在all规则中链接文件名后缀来使用类似函数的规则。) 起始状态: $ ls -tr1 Snakefile raw1.csv raw2.csv $ snakemake all ... 之后: $ ls -tr1 Snakefile raw1.csv raw2.csv raw2.a.csv raw2.a.b.csv raw2.a.b.c.csv raw2.a.b.c.d.csv raw1.a.csv raw1.a.b.csv r
all
规则中链接文件名后缀来使用类似函数的规则。)
起始状态:
$ ls -tr1
Snakefile
raw1.csv
raw2.csv
$ snakemake all
...
之后:
$ ls -tr1
Snakefile
raw1.csv
raw2.csv
raw2.a.csv
raw2.a.b.csv
raw2.a.b.c.csv
raw2.a.b.c.d.csv
raw1.a.csv
raw1.a.b.csv
raw1.a.b.c.csv
raw1.a.b.c.a.csv
raw1.a.b.c.a.d.csv
raw2.a.b.c.d.a.csv
现在,我想添加一个删除特定中间文件的规则(例如raw1.a.csv
和raw2.a.b.csv
),因为我不需要它们,它们占用了大量磁盘空间。由于通配符{path}
,我无法使用temp()
标记输出
有什么建议吗?谢谢。编辑:实际上,这个解决方案不起作用。。它会导致比赛状态
好吧,我知道了
rule a:
input: '{path}.csv'
output: '{path}.a.csv'
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: '{path}.c.csv'
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'
rule remove: # <-- rule to delete a file
input: '{path}'
output: touch('{path}.removed')
shell: 'rm {input}'
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv',
'raw1.a.csv.removed', # <-- specify which files to rm
'raw2.a.b.c.csv.removed', # <-- specify which files to rm
temp()
在这种情况下有效
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv'
rule a:
input: '{path}.csv'
output: temp('{path}.a.csv')
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: temp('{path}.c.csv')
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'
执行此操作将创建文件
raw1.a.b.c.a.d.csv、raw1.a.b.csv、raw2.a.b.csv、raw2.a.b.csv
并自动删除文件raw1.a.csv、raw2.a.csv、raw1.a.b.c.csv、raw2.a.b.c.csv、raw1.a.b.c.c.csv、raw2.a.b.d.csv
。
$ snakemake --dag all | dot -Tpng > dag.png
rule all:
input: 'raw1.a.b.c.a.d.csv',
'raw2.a.b.c.d.a.csv'
rule a:
input: '{path}.csv'
output: temp('{path}.a.csv')
shell: 'cp {input} {output}'
rule b:
input: '{path}.csv'
output: '{path}.b.csv'
shell: 'cp {input} {output}'
rule c:
input: '{path}.csv'
output: temp('{path}.c.csv')
shell: 'cp {input} {output}'
rule d:
input: '{path}.csv'
output: '{path}.d.csv'
shell: 'cp {input} {output}'