Apache kafka spooldir连接器未处理大文件
有一个包含4000万条记录的大文件。spool dir连接器处理了一半的记录,但之后它停止将记录推送到主题。日志如下所示-Apache kafka spooldir连接器未处理大文件,apache-kafka,apache-kafka-connect,Apache Kafka,Apache Kafka Connect,有一个包含4000万条记录的大文件。spool dir连接器处理了一半的记录,但之后它停止将记录推送到主题。日志如下所示- 327878 [2021-01-07 23:08:59,903] INFO Processed 20060000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144) 327879 [
327878 [2021-01-07 23:08:59,903] INFO Processed 20060000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327879 [2021-01-07 23:08:59,997] INFO Processed 20080000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327880 [2021-01-07 23:09:00,225] INFO Processed 20100000 lines of /dir/dir1/abc.txt_1607697517821.txt (com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceTask:144)
327881 [2021-01-07 23:09:04,788] INFO WorkerSourceTask{id=cust-stream-1} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
327882 [2021-01-07 23:09:04,788] INFO WorkerSourceTask{id=cust-stream-1} flushing 0 outstanding messages for offset c ommit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
327883 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-1} Finished commitOffsets successfully in 6 ms (org.apache.kafka.connect.runtime.WorkerSourceTask:574)
327884 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-0} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
327885 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-0} flushing 0 outstanding messages for offset c ommit (org.apache.kafka.connect.runtime.WorkerSourceTask:495)
327886 [2021-01-07 23:09:04,795] INFO WorkerSourceTask{id=cust-stream-2} Committing offsets (org.apache.kafka.connect .runtime.WorkerSourceTask:478)
最后几行中的提交偏移量刷新消息在日志中重复出现
abc_1607697517821.txt.PROCESSING file仍然存在,表明它尚未完成。如果与处理大文件的需要不匹配,您仍然可以使用Connect FilePulse替代Spooldir:如果与处理大文件的需要不匹配,您仍然可以使用Connect FilePulse替代Spooldir: