Configuration Snakemake-以特定于规则的方式重写LSF(bsub)群集配置
是否可以在集群配置文件中定义内存和资源的默认设置,然后在需要时以特定于规则的方式重写?规则中的Configuration Snakemake-以特定于规则的方式重写LSF(bsub)群集配置,configuration,cluster-computing,snakemake,Configuration,Cluster Computing,Snakemake,是否可以在集群配置文件中定义内存和资源的默认设置,然后在需要时以特定于规则的方式重写?规则中的resources字段是否直接绑定到集群配置文件?或者,为了便于阅读,这只是params字段的一种奇特方式吗 在下面的示例中,我如何使用规则a的默认群集配置,而在规则b中使用自定义更改(内存=40000和rusage=15000) cluster.json: { "__default__": { "memory": 20000, "resources":
resources
字段是否直接绑定到集群配置文件?或者,为了便于阅读,这只是params
字段的一种奇特方式吗
在下面的示例中,我如何使用规则a
的默认群集配置,而在规则b
中使用自定义更改(内存=40000
和rusage=15000
)
cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
蛇形文件:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
执行命令:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
我知道可以在集群配置文件中定义特定于规则的资源需求,但如果可能的话,我更愿意直接在Snakefile中定义它们
或者,如果有更好的实现方法,请告诉我。您可以直接将
资源添加到每个规则中:
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
resources:
mem_mb=40000
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=20000
shell:
'touch {output}'
然后,您应该从.json
中删除resources
参数,以便命令行不会覆盖Snake文件:
new.cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
您可以直接向每个规则添加资源
:
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
resources:
mem_mb=40000
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=20000
shell:
'touch {output}'
然后,您应该从.json
中删除resources
参数,以便命令行不会覆盖Snake文件:
new.cluster.json:
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
rule all:
'a_out.txt', 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
shell:
'touch {output}'
snakemake --cluster-config cluster.json
--cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt"
-j 50
{
"__default__":
{
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
在new.cluster.json
中,您实际上可以为特定规则定义资源。因此,在您的情况下,您将执行以下操作
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
"b":
{
"memory": 40000,
"resources": "\"rusage[mem=15000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
然后在Snakefile
中,您可以通过导入new.cluster.json
并在规则中引用它来引用这些资源
import json
with open('new.cluster.json') as fh:
cluster_config = json.load(fh)
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=cluster_config["b"]["memory"]
shell:
'touch {output}'
如果您仔细查看,您可以看到我是如何在野外使用这些集群配置的。在new.cluster.json
中,您实际上可以为特定规则定义资源。因此,在您的情况下,您将执行以下操作
{
"__default__":
{
"memory": 20000,
"resources": "\"rusage[mem=8000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
"b":
{
"memory": 40000,
"resources": "\"rusage[mem=15000] span[hosts=1]\"",
"output": "logs/cluster/{rule}.{wildcards}.out",
"error": "logs/cluster/{rule}.{wildcards}.err"
},
}
然后在Snakefile
中,您可以通过导入new.cluster.json
并在规则中引用它来引用这些资源
import json
with open('new.cluster.json') as fh:
cluster_config = json.load(fh)
rule all:
'a_out.txt' , 'b_out.txt'
rule a:
input:
'a.txt'
output:
'a_out.txt'
shell:
'touch {output}'
rule b:
input:
'b.txt'
output:
'b_out.txt'
resources:
mem_mb=cluster_config["b"]["memory"]
shell:
'touch {output}'
如果您仔细查看,您可以看到我是如何在野外使用这些群集配置的。在我发布到这里之后,找到了这篇相关文章-。我不会删除我的帖子,因为这个问题显示了更多的细节和例子。我在这里发布后发现了这个相关帖子-。我不会删除我的帖子,因为这个问题通过示例显示了更多细节。我希望在集群配置文件中设置默认资源约束,然后在需要时在Snakefile中以特定于规则的方式更改它们。我在这里的想法是,我只需要很少地改变资源。话虽如此,我最近一直在思考如何实现您在这里描述的内容。为了减少冗长,我更喜欢前者,但也有人认为所有规则都保存在同一个文件中。我希望在集群配置文件中设置默认资源约束,然后在需要时在Snakefile中以特定于规则的方式更改它们。我在这里的想法是,我只需要很少地改变资源。话虽如此,我最近一直在思考如何实现您在这里描述的内容。为了减少冗长,我更喜欢前者,但也可以说所有关于规则的内容都保存在同一个文件中。