Google cloud dataflow 如何在apachebeam中使用ParDo和DoFn创建读取转换_Google Cloud Dataflow_Apache Beam

Google cloud dataflow 如何在apachebeam中使用ParDo和DoFn创建读取转换

google-cloud-dataflow

Google cloud dataflow 如何在apachebeam中使用ParDo和DoFn创建读取转换,google-cloud-dataflow,apache-beam,Google Cloud Dataflow,Apache Beam,按照推荐的方式编写简单的源代码需要使用读取转换和ParDo。不幸的是，apachebeam文档让我失望了我试图编写一个简单的无界数据源，它使用ParDo发出事件，但编译器一直在抱怨DoFn对象的输入类型： message: 'The method apply(PTransform<? super PBegin,OutputT>) in the type PBegin is not applicable for the arguments (ParDo.SingleOutput&l

按照推荐的方式编写简单的源代码需要使用读取转换和

ParDo

。不幸的是，apachebeam文档让我失望了

我试图编写一个简单的无界数据源，它使用

ParDo

发出事件，但编译器一直在抱怨

DoFn

对象的输入类型：

message: 'The method apply(PTransform<? super PBegin,OutputT>) in the type PBegin is not applicable for the arguments (ParDo.SingleOutput<PBegin,Event>)'

message:“方法apply（pttransformADoFn
执行元素处理。如前所述，ParDo.of（new ReadFn（））
将具有类型pttransform
。具体地说，ReadFn
表示它接受类型为PBegin
的元素，并返回0个或多个类型为Event
的元素
相反，您应该使用实际的Read
操作。有。如果您有一组特定的内存集合要使用，您也可以使用Create

如果需要创建自定义源，则应使用。因为您使用的是计时器，所以可能需要创建一个无边界源（元素流）.您能告诉我更多关于您试图编写的源类型的信息吗？可能有一种更简单的方法。@jkff最终我想从SQL数据库中每隔5秒轮询一次所有新记录，并将它们作为protobuf对象输出。我认为现有的Beam源不可能做到这一点。好的，如果您想查找新记录，我会在查询结果中，您可能应该使用Watch转换-请看一看，看看它是否符合您的要求。我想澄清的是，只有当您无法使用ParDo实现源代码时，此指南才适用。源API编程非常麻烦，如果可以表达，您应该避免使用它您的IO没有它。ParDo解决方案中缺少的一点是传递给ParDo的集合：通常类似于Create.of（seed元素），例如，要读取文件，请创建.of（filepattern）+ParDo（将filepattern拆分为文件）+fusion break+ParDo（读取每个文件）.Ben是对的，如果您正在实现流式IO，那么您可能会无限源；但是您可能不会。如果这是一个简单的轮询问题，那么您可以通过将GenerateSequence转换组合为一个周期性的“勾号生成器”（使用允许您控制数据生成速率的方法）来获得所需的结果在每一次滴答声中，ParDo都会轮询更多的数据。
public class TestIO extends PTransform<PBegin, PCollection<Event>> {

    @Override
    public PCollection<Event> expand(PBegin input) {
        return input.apply(ParDo.of(new ReadFn()));
    }

    private static class ReadFn extends DoFn<PBegin, Event> {
        @ProcessElement
        public void process(@TimerId("poll") Timer pollTimer) {
            Event testEvent = new Event(...);

            //custom logic, this can happen infinitely
            for(...) {
                context.output(testEvent);
            }
        }
    }
}