Apache nifi 如何在Nifi处理器中提取和操作数据_Apache Nifi

Apache nifi 如何在Nifi处理器中提取和操作数据

apache-nifi

Apache nifi 如何在Nifi处理器中提取和操作数据,apache-nifi,Apache Nifi,我正在尝试编写一个自定义Nifi处理器，它将接收传入流文件的内容，对其执行一些数学运算，然后将结果写入传出流文件。是否有方法将传入流文件的内容转储到字符串或其他内容中？我已经搜索了一段时间了，但似乎没有那么简单。如果有人能给我推荐一本关于做类似事情的好教程，我将不胜感激该文档很好地描述了创建自定义处理器的过程。在您的具体案例中，我将从部分和模式开始。任何其他做类似工作（如或）的处理器都是值得学习的好例子；所有源代码都可以在上获得基本上，您需要在处理器类中实现#onTrigger（）方法，读取

我正在尝试编写一个自定义Nifi处理器，它将接收传入流文件的内容，对其执行一些数学运算，然后将结果写入传出流文件。是否有方法将传入流文件的内容转储到字符串或其他内容中？我已经搜索了一段时间了，但似乎没有那么简单。如果有人能给我推荐一本关于做类似事情的好教程，我将不胜感激

该文档很好地描述了创建自定义处理器的过程。在您的具体案例中，我将从部分和模式开始。任何其他做类似工作（如或）的处理器都是值得学习的好例子；所有源代码都可以在上获得

基本上，您需要在处理器类中实现

#onTrigger（）

方法，读取流文件内容并将其解析为预期格式，执行操作，然后重新填充生成的流文件内容。您的源代码如下所示：

    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession session) throws ProcessException {
        FlowFile flowFile = session.get();
        if (flowFile == null) {
            return;
        }

        final ComponentLog logger = getLogger();
        AtomicBoolean error = new AtomicBoolean();
        AtomicReference<String> result = new AtomicReference<>(null);

        // This uses a lambda function in place of a callback for InputStreamCallback#process()
        processSession.read(flowFile, in -> {
            long start = System.nanoTime();

            // Read the flowfile content into a String
            // TODO: May need to buffer this if the content is large
            try {
                final String contents = IOUtils.toString(in, StandardCharsets.UTF_8);
                result.set(new MyMathOperationService().performSomeOperation(contents));

                long stop = System.nanoTime();
                if (getLogger().isDebugEnabled()) {
                    final long durationNanos = stop - start;
                    DecimalFormat df = new DecimalFormat("#.###");
                    getLogger().debug("Performed operation in " + durationNanos + " nanoseconds (" + df.format(durationNanos / 1_000_000_000.0) + " seconds).");
                }
            } catch (Exception e) {
                error.set(true);
                getLogger().error(e.getMessage() + " Routing to failure.", e);
            }
        });

        if (error.get()) {
            processSession.transfer(flowFile, REL_FAILURE);
        } else {
            // Again, a lambda takes the place of the OutputStreamCallback#process()
            FlowFile updatedFlowFile = session.write(flowFile, (in, out) -> {
                final String resultString = result.get();
                final byte[] resultBytes = resultString.getBytes(StandardCharsets.UTF_8);

                // TODO: This can use a while loop for performance
                out.write(resultBytes, 0, resultBytes.length);
                out.flush();
            });
            processSession.transfer(updatedFlowFile, REL_SUCCESS);
        }
    }

@覆盖
public void OnTigger（最终ProcessContext上下文，最终ProcessSession会话）引发ProcessException{
FlowFile FlowFile=session.get（）；
if（flowFile==null）{
返回；
}
最终组件日志记录器=getLogger（）；
AtomicBoolean错误=新的AtomicBoolean（）；
AtomicReference结果=新的AtomicReference（空）；
//这使用lambda函数代替InputStreamCallback#process（）的回调
processSession.read（流文件，在->{
长启动=System.nanoTime（）；
//将流文件内容读入字符串
//TODO:如果内容较大，可能需要对此进行缓冲
试一试{
最终字符串内容=IOUtils.toString（in，StandardCharsets.UTF_8）；
result.set（新建MyMathOperationService（）.performSomeOperation（内容））；
长时间停止=System.nanoTime（）；
if（getLogger（）.isDebugEnabled（））{
最终长持续时间NANOS=停止-启动；
DecimalFormat df=新的DecimalFormat（“#.####”）；
getLogger（）.debug（“以“+durationNanos+”纳秒（“+df.格式（durationNanos/1_000_000_000.0）+”秒）”执行的操作”）；
}
}捕获（例外e）{
错误。设置（true）；
getLogger（）.错误（例如getMessage（）+“路由到故障”，e）；
}
});
if（error.get（））{
processSession.transfer（流文件，REL_失败）；
}否则{
//同样，lambda代替了OutputStreamCallback#process（）
FlowFile updatedFlowFile=session.write（FlowFile，（输入，输出）->{
最终字符串resultString=result.get（）；
最终字节[]resultBytes=resultString.getBytes（StandardCharsets.UTF_8）；
//TODO:这可以使用while循环来提高性能
out.write（resultBytes，0，resultBytes.length）；
out.flush（）；
});
processSession.transfer（更新的流文件，REL_SUCCESS）；
}
}

Daggett认为处理器是一个很好的起点，这是正确的，因为它将缩短开发生命周期（无需构建NAR、部署和重新启动NiFi来使用它），并且当您具有正确的行为时，您可以轻松地复制/粘贴到生成的框架中并部署一次

最好从ExecuteScript或ExecuteGroovyScript处理器开始。Groovy是基于java的，应该很简单。作为处理器的示例，以任何类似这样的简单处理器的源代码为例：我很好奇为什么Nifi开发人员指南使用OutputStreamCallback.process（）函数，如果它可以被lambda替换的话？lambda不能使用有什么原因或情况吗？开发人员指南最初是在支持Java 8（甚至存在）和lambda可用之前编写的。更详细的行为仍然有效，并且可以说明不熟悉lambdas的开发人员正在发生什么。文档被视为代码，社区改进总是受到欢迎。