Configuration Snakemake-以特定于规则的方式重写LSF（bsub）群集配置_Configuration_Cluster Computing_Snakemake

Configuration Snakemake-以特定于规则的方式重写LSF（bsub）群集配置

configuration cluster-computing

Configuration Snakemake-以特定于规则的方式重写LSF（bsub）群集配置,configuration,cluster-computing,snakemake,Configuration,Cluster Computing,Snakemake,是否可以在集群配置文件中定义内存和资源的默认设置，然后在需要时以特定于规则的方式重写？规则中的resources字段是否直接绑定到集群配置文件？或者，为了便于阅读，这只是params字段的一种奇特方式吗在下面的示例中，我如何使用规则a的默认群集配置，而在规则b中使用自定义更改（内存=40000和rusage=15000） cluster.json: { "__default__": { "memory": 20000, "resources":

是否可以在集群配置文件中定义内存和资源的默认设置，然后在需要时以特定于规则的方式重写？规则中的

resources

字段是否直接绑定到集群配置文件？或者，为了便于阅读，这只是

params

字段的一种奇特方式吗

在下面的示例中，我如何使用

规则a

的默认群集配置，而在

规则b

中使用自定义更改（

内存=40000

和

rusage=15000

）

cluster.json:

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

蛇形文件：

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

执行命令：

{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

我知道可以在集群配置文件中定义特定于规则的资源需求，但如果可能的话，我更愿意直接在Snakefile中定义它们

或者，如果有更好的实现方法，请告诉我。

您可以直接将

资源添加到每个规则中：
rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    resources:
        mem_mb=40000
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=20000
    shell:
        'touch {output}'

然后，您应该从.json
中删除resources
参数，以便命令行不会覆盖Snake文件：
new.cluster.json:
{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

您可以直接向每个规则添加资源
：
rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    resources:
        mem_mb=40000
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=20000
    shell:
        'touch {output}'

然后，您应该从.json
中删除resources
参数，以便命令行不会覆盖Snake文件：
new.cluster.json:
{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

rule all:
    'a_out.txt', 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'

rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    shell:
        'touch {output}'

 snakemake --cluster-config cluster.json 
           --cluster "bsub -M {cluster.memory} -R {cluster.resources} -o logs.txt" 
           -j 50

{
    "__default__":
    {
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

在new.cluster.json
中，您实际上可以为特定规则定义资源。因此，在您的情况下，您将执行以下操作
{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
    "b":
    {
        "memory": 40000,
        "resources": "\"rusage[mem=15000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

然后在Snakefile
中，您可以通过导入new.cluster.json
并在规则中引用它来引用这些资源
import json

with open('new.cluster.json') as fh:
    cluster_config = json.load(fh)

rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=cluster_config["b"]["memory"]
    shell:
        'touch {output}'

如果您仔细查看，您可以看到我是如何在野外使用这些集群配置的。
在new.cluster.json
中，您实际上可以为特定规则定义资源。因此，在您的情况下，您将执行以下操作
{
    "__default__":
    {
        "memory": 20000,
        "resources": "\"rusage[mem=8000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
    "b":
    {
        "memory": 40000,
        "resources": "\"rusage[mem=15000] span[hosts=1]\"",
        "output": "logs/cluster/{rule}.{wildcards}.out",
        "error": "logs/cluster/{rule}.{wildcards}.err"
    },
}

然后在Snakefile
中，您可以通过导入new.cluster.json
并在规则中引用它来引用这些资源
import json

with open('new.cluster.json') as fh:
    cluster_config = json.load(fh)

rule all:
    'a_out.txt' , 'b_out.txt'

rule a:
    input:
        'a.txt'
    output:
        'a_out.txt'
    shell:
        'touch {output}'
rule b:
    input:
        'b.txt'
    output:
        'b_out.txt'
    resources:
        mem_mb=cluster_config["b"]["memory"]
    shell:
        'touch {output}'

如果您仔细查看，您可以看到我是如何在野外使用这些群集配置的。
在我发布到这里之后，找到了这篇相关文章-。我不会删除我的帖子，因为这个问题显示了更多的细节和例子。我在这里发布后发现了这个相关帖子-。我不会删除我的帖子，因为这个问题通过示例显示了更多细节。我希望在集群配置文件中设置默认资源约束，然后在需要时在Snakefile中以特定于规则的方式更改它们。我在这里的想法是，我只需要很少地改变资源。话虽如此，我最近一直在思考如何实现您在这里描述的内容。为了减少冗长，我更喜欢前者，但也有人认为所有规则都保存在同一个文件中。我希望在集群配置文件中设置默认资源约束，然后在需要时在Snakefile中以特定于规则的方式更改它们。我在这里的想法是，我只需要很少地改变资源。话虽如此，我最近一直在思考如何实现您在这里描述的内容。为了减少冗长，我更喜欢前者，但也可以说所有关于规则的内容都保存在同一个文件中。