Python 在脚本之外操作Nextflow变量_Python_Groovy_Workflow_Nextflow

Python 在脚本之外操作Nextflow变量

python groovy workflow

Python 在脚本之外操作Nextflow变量,python,groovy,workflow,nextflow,Python,Groovy,Workflow,Nextflow,我有一个进程迭代列表。Process iterate_list获取一个列表，并对列表中的每个项目执行操作。运行脚本时，需要两个输入。列表及其需要处理的项（作为使用者从rabbitmq队列获取）目前，我给了一个python脚本一个完整的列表，它对每个脚本进行迭代处理（作为一个大的块），并在完成后返回。这很好，但是，如果系统重新启动，它会重新启动我想知道，如何使python脚本每次处理单个项目时都返回该项目，我将其从列表中删除，然后将新列表传递给该进程。因此，在系统重新启动/崩溃的情况下，nex

我有一个进程迭代列表。Process iterate_list获取一个列表，并对列表中的每个项目执行操作。运行脚本时，需要两个输入。列表及其需要处理的项（作为使用者从rabbitmq队列获取）

目前，我给了一个python脚本一个完整的列表，它对每个脚本进行迭代处理（作为一个大的块），并在完成后返回。这很好，但是，如果系统重新启动，它会重新启动

我想知道，如何使python脚本每次处理单个项目时都返回该项目，我将其从列表中删除，然后将新列表传递给该进程。因此，在系统重新启动/崩溃的情况下，nextflow知道它从何处停止，并可以从何处继续

import groovy.json.JsonSlurper

def jsonSlurper = new JsonSlurper()
def cfg_file = new File('/config.json')
def analysis_config = jsonSlurper.parse(cfg_file)
def cfg_json = cfg_file.getText()
def list_of_items_to_process = [] 

items = Channel.from(analysis_config.items.keySet())

for (String item : items) {
    list_of_items_to_process << item
    } 

process iterate_list{
    echo true

    input:
    list_of_items_to_process

    output:
    val 1 into typing_cur

    script:
    """
    python3.7 process_list_items.py ${my_queue} \'${list_of_items_to_process}\'
    """ 
}

process signal_completion{

    echo true

    input:
    val typing_cur

    script:
    """
    echo "all done!"
    """
}

import groovy.json.JsonSlurper
def jsonSlurper=新的jsonSlurper（）
def cfg_file=新文件（'/config.json'）
def analysis_config=jsonSlurper.parse（cfg_文件）
def cfg_json=cfg_file.getText（）
def list_of_items_to_process=[]
items=Channel.from（analysis\u config.items.keySet（））
用于（字符串项：项）{
“项目列表”到“进程”看起来就像你真正想要做的是在下一个流程中操纵一个全局的ArrayList
。好吧，没有办法做到这一点。这就是你的目的
现在还不清楚您是否确实需要从要处理的项目列表中删除任何项目。Nextflow已经可以使用-resume
选项使用缓存结果。因此，为什么不只传入完整列表和单个项目进行处理呢
items = Channel.from(['foo', 'bar', 'baz'])

items.into {
    items_ch1
    items_ch2
}

process iterate_list{

    input:
    val item from items_ch1
    val list_of_items_to_process from items_ch2.collect()

    """
    python3.7 process_list_items.py "${item}" '${list_of_items_to_process}'
    """
}

我只能猜测Python脚本如何使用其参数，但如果要处理的项目列表只是一个占位符，那么您甚至可以输入要处理的项目的单个元素列表：
items = Channel.from(['foo', 'bar', 'baz'])

process iterate_list{

    input:
    val item from items

    """
    python3.7 process_list_items.py "${item}" '[${item}]'
    """
}

由于nextflow不允许使用Google Cloud Filestore，我有点滥用工作流，让它处理我的rabbitmq RPC，而不是真正自己完成工作。但这正在发生变化，现在我最大的问题是存储。我如何使用云存储桶中的输入。然后我如何处理文件并将其上载回output bucket。我明天会问这个问题。Nextflow有一个小社区。@daudnadeem：基本上只需使用gs://
协议指定它们，例如gs://yourback/path\u to_file
。然后确保指定一个gs bucket，用于存储工作流的中间结果。您可以使用-work dir
选项来完成此操作ion，例如nextflow run main.nf-work dir gs://yourbucket/project/workdir
。您可能还希望使用一个名为--outdir
的自定义参数来指定一个gs bucket，以使用该指令发布输出。然后将--outdir gs://yourbucket/project/results
添加到cmd行。好的。假设我有一个文件gs://这个bucket/file.txt，我给它的工作目录是gs://this bucket，如果我想访问这个文件，我是用gs://this bucket/file.txt指向它，还是用Nextflow将这个文件下载到本地存储，我只把它称为file.txt？@daudnadeem：事实上，我认为你需要按照提示指定一个bucket子目录作为你的工作目录。但是你应该能够将您的文件指向gs://this bucket/file.txt
。然后，例如，在脚本中包括params.foobar=“gs://this bucket/file.txt”
和foobar=file（params.foobar）
，然后您应该能够将其包含在任何进程中，即process baz{input:file（foobar）…}@daudnadeem：“任何未存储在Google存储桶中的输入数据将自动传输到管道工作桶”-来自
items = Channel.from(['foo', 'bar', 'baz'])

items.into {
    items_ch1
    items_ch2
}

process iterate_list{

    input:
    val item from items_ch1
    val list_of_items_to_process from items_ch2.collect()

    """
    python3.7 process_list_items.py "${item}" '${list_of_items_to_process}'
    """
}

items = Channel.from(['foo', 'bar', 'baz'])

process iterate_list{

    input:
    val item from items

    """
    python3.7 process_list_items.py "${item}" '[${item}]'
    """
}