Java Nifi处理中缺少flowfile异常导致信息丢失

Java Nifi处理中缺少flowfile异常导致信息丢失,java,etl,apache-nifi,Java,Etl,Apache Nifi,在ETL过程中,我们遇到了导致流文件丢失的随机异常。 Nifi部署在3节点Kubernetes集群上,存储库位于共享文件系统(GlusterFS)上。 我们做了一些压力测试,在2000个正在处理的csv文件中,几乎有10%丢失,报告了异常情况。 我们还尝试缩小到一个节点,并将并行线程数设置为1,以最大限度地减少相关处理器(validatecsv和validatejsonpath)上的并行问题。 处理器似乎在晚些时候尝试访问流文件内容。 这个问题不是系统性的和随机的,它发生在1.8上,但是升级到最

在ETL过程中,我们遇到了导致流文件丢失的随机异常。 Nifi部署在3节点Kubernetes集群上,存储库位于共享文件系统(GlusterFS)上。 我们做了一些压力测试,在2000个正在处理的csv文件中,几乎有10%丢失,报告了异常情况。 我们还尝试缩小到一个节点,并将并行线程数设置为1,以最大限度地减少相关处理器(validatecsv和validatejsonpath)上的并行问题。 处理器似乎在晚些时候尝试访问流文件内容。 这个问题不是系统性的和随机的,它发生在1.8上,但是升级到最后一个稳定的1.9.2也没有帮助

这是发生的异常。 感谢您的帮助

    2019-11-11 08:34:10,011 ERROR [Timer-Driven Process Thread-7] o.a.n.p.standard.CompressContent CompressContent[id=b634d291-6f29-389e-b481-3539828a2205] CompressContent[id=b634d291-6f29-389e-b481-3539828a2205] failed to process session due to org.apache.nifi.processor.exception.MissingFlowFileException: Unable to find content for FlowFile; Processor Administratively Yielded for 1 sec: org.apache.nifi.processor.exception.MissingFlowFileException: Unable to find content for FlowFile
org.apache.nifi.processor.exception.MissingFlowFileException: Unable to find content for FlowFile
    at org.apache.nifi.controller.repository.StandardProcessSession.handleContentNotFound(StandardProcessSession.java:3132)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2926)
    at org.apache.nifi.processors.standard.CompressContent.onTrigger(CompressContent.java:236)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.nifi.controller.repository.ContentNotFoundException: Could not find content for StandardContentClaim [resourceClaim=StandardResourceClaim[id=1573461249850-433, container=default, section=433], offset=8002, length=4957]: Stream contained only 0 bytes but should have contained 4957
    at org.apache.nifi.controller.repository.io.FlowFileAccessInputStream.ensureAllContentRead(FlowFileAccessInputStream.java:49)
    at org.apache.nifi.controller.repository.io.FlowFileAccessInputStream.read(FlowFileAccessInputStream.java:84)
    at org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:68)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.nifi.processors.standard.CompressContent$1.process(CompressContent.java:312)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2908)
    ... 12 common frames omitted
2019-11-11 08:34:10,013 WARN [Timer-Driven Process Thread-7] o.a.n.controller.tasks.ConnectableTask Administratively Yielding CompressContent[id=b634d291-6f29-389e-b481-3539828a2205] due to uncaught Exception: org.apache.nifi.processor.exception.MissingFlowFileException: Unable to find content for FlowFile
org.apache.nifi.processor.exception.MissingFlowFileException: Unable to find content for FlowFile
    at org.apache.nifi.controller.repository.StandardProcessSession.handleContentNotFound(StandardProcessSession.java:3132)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2926)
    at org.apache.nifi.processors.standard.CompressContent.onTrigger(CompressContent.java:236)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.nifi.controller.repository.ContentNotFoundException: Could not find content for StandardContentClaim [resourceClaim=StandardResourceClaim[id=1573461249850-433, container=default, section=433], offset=8002, length=4957]: Stream contained only 0 bytes but should have contained 4957
    at org.apache.nifi.controller.repository.io.FlowFileAccessInputStream.ensureAllContentRead(FlowFileAccessInputStream.java:49)
    at org.apache.nifi.controller.repository.io.FlowFileAccessInputStream.read(FlowFileAccessInputStream.java:84)
    at org.apache.nifi.controller.repository.io.TaskTerminationInputStream.read(TaskTerminationInputStream.java:68)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
    at java.io.FilterInputStream.read(FilterInputStream.java:107)
    at org.apache.nifi.processors.standard.CompressContent$1.process(CompressContent.java:312)
    at org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2908)
    ... 12 common frames omitted

处理器试图访问流文件内容是什么意思?在第一个内容未找到错误之前,您是否有任何警告/错误?如何将用于流文件存储的磁盘装载到Dockers?磁盘装载为NFS。我们可能找到了解决方案:流文件和内容之间存在流和同步问题,可能是由于垃圾收集器进程。启动无状态进程(没有存储库的卷似乎可以解决问题),我们将尝试在本地磁盘上创建卷