Dependencies luigi:task在不创建依赖项的情况下运行其他任务？_Dependencies_Yield_Luigi

Dependencies luigi:task在不创建依赖项的情况下运行其他任务？

dependencies

Dependencies luigi:task在不创建依赖项的情况下运行其他任务？,dependencies,yield,luigi,Dependencies,Yield,Luigi,在luigi中，我理解如果一个任务让位给另一个任务，那么第二个任务将成为原始任务的新依赖项，这将导致在让位的任务完成后重新运行原始任务但是，在某些情况下，我希望一个任务推迟到另一个任务，而不会使推迟到的任务成为依赖项。我之所以希望这样做，是因为我不希望在其他任务完成后重新运行当前任务的run方法是的，我知道我的run方法应该是幂等的。尽管如此，在某些情况下，我绝对不希望在屈服于其他任务后再次运行该方法我想出了一个办法，但我不确定这是否是最好的解决方案，如果你们有什么建议的话，我想听听你们的

在luigi中，我理解如果一个任务让位给另一个任务，那么第二个任务将成为原始任务的新依赖项，这将导致在让位的任务完成后重新运行原始任务

但是，在某些情况下，我希望一个任务推迟到另一个任务，而不会使推迟到的任务成为依赖项。我之所以希望这样做，是因为我不希望在其他任务完成后重新运行当前任务的

run

方法

是的，我知道我的

run

方法应该是幂等的。尽管如此，在某些情况下，我绝对不希望在屈服于其他任务后再次运行该方法

我想出了一个办法，但我不确定这是否是最好的解决方案，如果你们有什么建议的话，我想听听你们的建议

假设有两个任务：

MainTask

和

OtherTask

<代码>主任务通过命令行使用各种参数调用。根据这些参数的设置，

MainTask

可能会调用

OtherTask

。如果是这样，我不希望再次调用

MainTask

的

run

方法

class OtherTask(luigi.Task):
    # Under some circumstances, this task can be invoked
    # from the command line, and it can also be invoked
    # in the normal luigi manner as a dependency for one
    # or more other tasks.
    # It also might be yielded to, as is done in the
    # "run" method for `MainTask`, below.

    id = luigi.parameter.IntParameter()

    def complete(self):
        # ...
        # return True or False depending on various tests.

    def requires(self):
        # return [ ... various dependencies ... ]

    def run(self):
        # do stuff with self.id
        # ...
        with self.output().open('w') as f:
            f.write('OK')

    def output(self):
        return '... something ...'

class MainTask(luigi.Task):
    # Parameters are expected to be supplied on the command line.
    param1 = luigi.parameter.IntParameter()
    param2 = luigi.parameter.BoolParameter()
    # ... etc. ...

    def run(self):
        #
        # Here's how I keep this "run" method from being
        # invoked more than once. Is there a better way
        # to invoke `OtherTask` without having it cause 
        # this current task to be re-invoked?
        if self.complete():
            return

        # Normal "run" processing for this task ...
        # ... etc. ...

        # Possibly run `OtherTask` multiple times, only if
        # certain conditions are met ... 
        tasks = []
        if the_conditions_are_met:
            ids = []
            # Append multiple integer ID's to the `ids` list.
            # Calculate each ID depending upon the values
            # passed in via self.param1, self.param2, etc.
            # Do some processing depending on these ID's.
            # ... etc. ...

            # Then, create a list of tasks to be invoked,
            # each one taking one of these ID's as a parameter.
            for the_id in ids:
                tasks.append(OtherTask(id=the_id))

        with self.output().open('w') as f:
            f.write('OK')

        # Optionally yield after marking this task as 
        # complete, so that when the yielded tasks have
        # all run, this task's "run" method can test for
        # completion and not re-run its logic.
        if tasks:
            yield tasks

    def output(self):
        return '... whatever ...'

根据我的评论，使用辅助类似乎有效。它只运行一次，即使主类的

run

方法被多次调用，它也只会重用辅助类的输出数据，而不会重新计算

class OtherTask(luigi.Task):
    # Under some circumstances, this task can be invoked
    # from the command line, and it can also be invoked
    # in the normal luigi manner as a dependency for one
    # or more other tasks.
    # It also might be yielded to, as is done in the
    # "run" method for `MainTask`, below.

    id = luigi.parameter.IntParameter()

    def complete(self):
        # ...
        # return True or False depending on various tests.

    def requires(self):
        # return [ ... various dependencies ... ]

    def run(self):
        # do stuff with self.id
        # ...
        with self.output().open('w') as f:
            f.write('OK')

    def output(self):
        return '... something ...'

class AuxiliaryTask(luigi.Task):

    def requires(self):
        # return [ ... various dependencies ... ]

    def run(self):                
        ids = []
        # Append multiple integer ID's to the `ids` list.
        # Calculate each ID depending upon the values
        # passed to this task via its parameters. Then ...
        with self.output().open('w') as f:
            f.write(json.dumps(ids))

    def output(self):
        return '... something else ...' 

class MainTask(luigi.Task):
    # Parameters are expected to be supplied on the command line.
    param1 = luigi.parameter.IntParameter()
    param2 = luigi.parameter.BoolParameter()
    # ... etc. ...

    def requires(self):
        return [ self.clone(AuxiliaryTask) ]

    def run(self):
        # This method could get re-run after the yields,
        # below. However, it just re-reads its input, instead
        # of that input being recalculated. And in the second
        # invocation, luigi's dependency mechanism will prevent
        # any re-yielded-to tasks from repeating what they did
        # before.
        ids = []
        with self.input().open('r') as f:
            ids = json.dumps(f.read())

        if ids:
            tasks = []

            # Create a list of tasks to be invoked,
            # each one taking one of these ID's as a parameter.
            # Then, yield to each of these tasks.
            for the_id in ids:
                tasks.append(OtherTask(id=the_id))
            if tasks:
                yield tasks

        with self.output().open('w') as f:
            f.write('OK')


    def output(self):
        return '... whatever ...'

实际上，这只适用于所有生成的任务都成功并且不需要重新运行的情况，因为它强制在

MainTask

第一次完成后完成测试为true。如果重新运行，它将看起来是完整的，尽管产生了任务失败。我想出的处理这种情况的最佳方法是将

MainTask

的

run

方法中的所有逻辑取出，并将其放入新

AuxiliaryTask

的

run

方法中，该方法被设置为对

MainTask

的依赖项

AuxiliaryTask

输出其计算结果，而

MainTask

现在只需将其作为输入读取，并将其转换为

OtherTask

。也许这是使其正常工作的唯一方法，但我仍然希望有一种方法可以让一个任务推迟到另一个任务，而不必重新调用第一个任务的

run

方法。