Python Google DataFlow,在转换集合时如何等待外部webhook?

Python Google DataFlow,在转换集合时如何等待外部webhook?,python,google-cloud-dataflow,Python,Google Cloud Dataflow,我有一个读取Xlsx文件的代码,对于每一行,在特定列上执行一个进程 问题与数据流的“转换”部分有关。我实现了一个特定的方法来获取从读卡器发送的值,并将该数据发送到外部服务器。此外部服务器处理数据(可能需要几分钟),然后对结果执行POST请求。(POST请求的URL在原始请求中指定 我的问题如下:当外部进程完成(外部回调)时,如何通知我的ParDo方法 以下是我目前的代码: import logging, argparse import apache_beam as beam from apach

我有一个读取Xlsx文件的代码,对于每一行,在特定列上执行一个进程

问题与数据流的“转换”部分有关。我实现了一个特定的方法来获取从读卡器发送的值,并将该数据发送到外部服务器。此外部服务器处理数据(可能需要几分钟),然后对结果执行POST请求。(POST请求的URL在原始请求中指定

我的问题如下:当外部进程完成(外部回调)时,如何通知我的ParDo方法

以下是我目前的代码:

import logging, argparse
import apache_beam as beam
from apache_beam.io import gcsio
from apache_beam.utils.options import PipelineOptions

from openpyxl import load_workbook


# @See https://cloud.google.com/dataflow/model/custom-io-python#ptransform-wrappers
class FileReader():
    """A file reader implementation"""

    def __init__(self, path, *args, **kwargs):
        self.path = path

    def reader(self):
        return XlsxFileReader(self.path)


class XlsxFileReader():
    """The Xlsx file reader"""
    def __init__(self, path):
        self.path = path

    def _clean_value(self, value):
        if value is None:
            return None

        value = unicode(value)

        try:
            value = value.encode('utf-8')
        except UnicodeEncodeError:
            pass

        return value

    def __iter__(self):
        wb = load_workbook(filename=self.file, read_only=True)
        sheet_name = wb.get_sheet_names()[0]
        ws = wb[sheet_name]
        for line, row in enumerate(ws.rows):
            cell_value = self._clean_value(row[0].value)
            if cell_value is not None and cell_value.find('@') > 0:
                yield cell_value, line
                break

    def __enter__(self):
        self.file = gcsio.GcsIO().open(self.path, 'r')
        return self

    def __exit__(self, *args, **kwargs):
        self.file.close()


class ComputeWordLengthFn(beam.DoFn):
    def process(self, context):
        # Here, what I would need is send a request to an external API, that returns the result to the `callback` parameter.
        # I know how to do that using requests
        #
        # ***********************************************************
        # ---> BUT HOW can I know when that external service has done with my data and called back my `callback` url?
        # ***********************************************************
        yield context.element[0] is done once external service has made a request to the `callback` url on my instance.


def run(argv=None):
    parser = argparse.ArgumentParser()
    parser.add_argument(
        '--input',
        dest='input',
        default='gs://norbert-verify-staging/growthlist.xlsx',
        help='Input file to process.'
    )
    parser.add_argument(
        '--output',
        dest='output',
        required=True,
        help='Output file to write results to.'
    )
    known_args, pipeline_args = parser.parse_known_args(argv)

    pipeline_options = PipelineOptions(pipeline_args)
    p = beam.Pipeline(options=pipeline_options)

    p | 'read' >> beam.io.Read(FileReader(known_args.input)) \
      | 'verify' >> beam.ParDo(ComputeWordLengthFn()) \
      | 'write' >> beam.io.Write(beam.io.TextFileSink(known_args.output))

    p.run()


if __name__ == '__main__':
    logging.getLogger().setLevel(logging.INFO)
    run()

我希望清楚,如果您需要更多详细信息,请告诉我。

我不确定是否完全理解您的问题,但您似乎在问Beam是否提供了一种方法,可以在调用给定回调后调用DoFn.process()方法。目前Beam不提供此类功能

这里可以做的是在ComputeWordLengthFn.process()方法中等待,直到完成对特定元素的请求(执行该等待的确切方式取决于外部API)

如果我误解了你的问题,请告诉我