Java StreamingFileLink未将数据摄取到s3

Java StreamingFileLink未将数据摄取到s3,java,amazon-s3,apache-flink,Java,Amazon S3,Apache Flink,我创建了一个简单的摄取服务,它使用StreamingFileLink拾取本地文件并摄取到s3 我已经按照文档设置了所有内容,但它不起作用。我在prem路径上测试了另一个本地的接收器位置,文件正在到达那里(但隐藏为.part文件) 这是否意味着零件文件也发送到s3,但不可见 final StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment(); 环境setStreamTimeChara

我创建了一个简单的摄取服务,它使用StreamingFileLink拾取本地文件并摄取到s3

我已经按照文档设置了所有内容,但它不起作用。我在prem路径上测试了另一个本地的接收器位置,文件正在到达那里(但隐藏为.part文件)

这是否意味着零件文件也发送到s3,但不可见

final StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
环境setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
字符串路径=“/tmp/component_test”;
MyFileInputFormat MyFileInputFormat=新的MyFileInputFormat(新路径(路径));
myFileInputFormat.setNumSplits(1);
连续文件监视函数监视函数=
新的ContinuousFileMonitoringFunction(myFileInputFormat,
FileProcessingMode.PROCESS\u连续,
env.getParallelism(),1000);
//监视器始终显示DOP 1
DataStream splits=env.addSource(监控函数);
ContinuousFileReaderOperator reader=新的ContinuousFileReaderOperator(myFileInputFormat);
TypeInformation typeInfo=新的SimpleStringSchema().getProducedType();
//读卡器可以是多个
DataStream content=splits.transform(“filesplitereder”,typeInfo,reader);
SingleOutputStreamOperator ds=content.flatMap(
新的XMLSplitter());
//新路径(“s3:///raw/”)
//新路径(“file:///tmp/raw/")
StreamingFileSink接收器=StreamingFileSink
.forRowFormat(新路径(“s3a:///raw/”),
(Tuple2元素,OutputStream)->{
打印流输出=新的打印流(流);
out.println(元素f1);
})
//确定每个记录的组件类型
.使用BucketAssigner(新组件BucketAssigner())
.withRollingPolicy(DefaultRollingPolicy.create()。withMaxPartSize(100)。withRolloverInterval(1000)。build())
.带Bucketcheckinterval(100)
.build();
ds.addSink(sink);
初始化(GlobalConfiguration.loadConfiguration(System.getenv(“FLINK_CONF_DIR”));
execute();
...
我正在s3中查找零件文件,或者我是否需要对StreamingFileLink进行任何更改,以便以最小大小滚动零件文件

09:37:39387 INFO org.apache.flink.runtime.checkpoint.checkpoint Coordinator-完成作业34d46d2671c996d6150d88a2f74b4218的检查点1(38毫秒7558字节)。
09:37:39388 INFO org.apache.flink.streaming.api.functions.sink.filesystem.bucket-子任务0收到id为1的检查点的完成通知。
09:37:39389 INFO org.apache.flink.streaming.api.functions.sink.filesystem.bucket-子任务1收到id为1的检查点的完成通知。
09:37:39390 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务2收到id为1的检查点的完成通知。
09:37:39391 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务3收到id为1的检查点的完成通知。
09:37:39391 INFO org.apache.flink.fs.s3.common.writer.s3提交人-提交//第1-0部分,MPU ID为CEYMmUslgCnA2KcD5pslz.7dpaQuCAqmTJo6oDPv7P.rj45o4thrvtfdqmabxrvdwstwo2roir.r9VP2s4IMxlPtHz9r6CP_iQ7.dcp9ygdljing1galptunahvgugen
09:37:39391 INFO org.apache.flink.fs.s3.common.writer.s3提交人-提交//第0-0部分,MPU ID为ExM_zvvxhgNakuesqrklftm3hytoopaxdet1moxbejyxlejbyxfmespk7b.elmoydrmgotnpzagmh6lgyso2hfjtozltpcolyjvot3tkrec8yqsaj
09:37:39391 INFO org.apache.flink.fs.s3.common.writer.s3提交人-提交//MPU ID为64的第2-0部分。_ocicewpawrmrri_lxckyefqytisksslsheajgxwgdpf3qth0qvom2c3k8s2l6udj8yzfm9yejhopgqirl0hmfokcyma49bzubhgm3kqmicve9coniteb4etnejca
09:37:39393 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务4收到id为1的检查点的完成通知。
09:37:39394 INFO org.apache.flink.fs.s3.common.writer.s3提交人-提交//第3-0部分,MPU ID YUFGVFH9YOL36MUUTIAYLEHCMYQGRYOBV0BBE.E3UCIKLYLI6S4RFNCGTFST2PJIEJQ97BFTFTMYCP4WGW5KX4JSRMZAFK.KQIYNMUEWWCOLKMWOKTVWHVMSPB
09:37:39394 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务5收到id为1的检查点的完成通知。
09:37:39395 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务6收到id为1的检查点的完成通知。
09:37:39394 INFO org.apache.flink.fs.s3.common.writer.s3提交人-提交//第4-0部分,MPU ID为Ab7sTpLJp3fNCCYVXe2nUO5qWmYxMeYQlOssRpeawoY2LDV.a58eShdp.anfe6yxtnviewcmrekiysgujs2slbxwnrph2ax50ncxusdfkyvazymuqjjbztxgdyw
09:37:39395 INFO org.apache.flink.fs.s3.common.writer.s3提交者-提交//part-5-0,MPU ID xDBOUVLHPX7QRFRS9Y93LC7WWO20L5MXKTCWFBAMAVKTWZEIGEU2BU5H2NNCRZWBCPDMEPSDOBK64Lvos8TxUhlftq_nkbFxis2K6OY6NuttisdG4SRWWC6RM
09:37:39395 INFO org.apache.flink.streaming.api.functions.sink.filesystem.Buckets-子任务7收到id为1的检查点的完成通知。
09:37:39397 INFO org.apache.flink.fs.s3.common.writer.s3提交者-提交//第6-0部分,MPU ID为0UZ35xRL2SHWXZL5NLY3Z1KHTSHBSQHIAJ6HZ9CBZFGXFIF7BWRNJDGHQHHHWPS9N0WFCPQXB12XBNENJQ6CLCX0xZRGGHGKUGEWHFEBIOURRO8xUVMT1OT7GXIY

只有在启用了检查点的情况下,StreamingFileLink才能工作。零件文件作为checkpo的一部分最终确定
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();        
            env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
            String path = "/tmp/component_test";

            MyFileInputFormat myFileInputFormat = new MyFileInputFormat(new Path(path));
            myFileInputFormat.setNumSplits(1);

            ContinuousFileMonitoringFunction<String> monitoringFunction =
                    new ContinuousFileMonitoringFunction<>(myFileInputFormat,
                            FileProcessingMode.PROCESS_CONTINUOUSLY,
                            env.getParallelism(), 1000);


            // the monitor has always DOP 1
            DataStream<TimestampedFileInputSplit> splits = env.addSource(monitoringFunction);

            ContinuousFileReaderOperator<String> reader = new ContinuousFileReaderOperator<>(myFileInputFormat);
            TypeInformation<String> typeInfo = new SimpleStringSchema().getProducedType();

            // the readers can be multiple
            DataStream<String> content = splits.transform("FileSplitReader", typeInfo, reader);

            SingleOutputStreamOperator<Tuple2<String, String>> ds = content.flatMap(
                    new XMLSplitter());


            //new Path("s3://<bucket_name>/raw/")
            //new Path("file:///tmp/raw/")
            StreamingFileSink<Tuple2<String, String>> sink = StreamingFileSink
                    .forRowFormat(new Path("s3a://<bucket-name>/raw/"),
                            (Tuple2<String, String> element, OutputStream stream) -> {
                                PrintStream out = new PrintStream(stream);
                                out.println(element.f1);
                            })
                    // Determine component type for each record
                    .withBucketAssigner(new ComponentBucketAssigner())
                    .withRollingPolicy(DefaultRollingPolicy.create().withMaxPartSize(100).withRolloverInterval(1000).build())
                    .withBucketCheckInterval(100)
                    .build();
            ds.addSink(sink);       
            FileSystem.initialize(GlobalConfiguration.loadConfiguration(System.getenv("FLINK_CONF_DIR")));
            env.execute();
...
09:37:39,387 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator     - Completed checkpoint 1 for job 34d46d2671c996d6150d88a2f74b4218 (7558 bytes in 38 ms).
09:37:39,388 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 0 received completion notification for checkpoint with id=1.
09:37:39,389 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 1 received completion notification for checkpoint with id=1.
09:37:39,390 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 2 received completion notification for checkpoint with id=1.
09:37:39,391 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 3 received completion notification for checkpoint with id=1.
09:37:39,391 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER1>/part-1-0 with MPU ID CEYMmUslgCnA2KcD5pslz.7dpaQuCAqmTJo6oDPv7P.Rj45O4tHrVTfDQMABxrRvdWSTwO2RoIR.r9VP2s4IMxlPtHz9r6CP_iQ7.DcP9yGDLjIN1gaLPTunAhVGuGen
09:37:39,391 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER2>/part-0-0 with MPU ID ExM_.cfOZVvXHHGNakUeshSQrkLFtm3HytooPAxDet1MoXBEJYhxlEJBYyXFmeSpk7b.ElmoydrMgotnpZAgmsh6lGhQgMYoS2hFJtOZLtPCOLyJvOt3TKRecc8YqSAJ
09:37:39,391 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER3>/part-2-0 with MPU ID 64._ocicEwPAwrMrI_LXcKyEfqYtISKsLsheAjgXwGdpf3qTH0qvOM2C3k8s2L6UDJ8yZfm9YEJhopgQIrL0hmFokCyMa49bzUbhgm3KQmiCVe9CoNiTEb4ETnEJCZFA
09:37:39,393 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 4 received completion notification for checkpoint with id=1.
09:37:39,394 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER4>/part-3-0 with MPU ID yuFGGVfh9YOL36mUUTIAyyLehCMyQGrYoabdv0BBe.e3uCIkLYLI6S4RfnCGtFsT2pjiEJq97bfftMycp4wGW5KKX4jsrmZAfK.kqiYnMUeWWcolXKmWOktVvwHvmSpB
09:37:39,394 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 5 received completion notification for checkpoint with id=1.
09:37:39,395 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 6 received completion notification for checkpoint with id=1.
09:37:39,394 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER5>/part-4-0 with MPU ID Ab7sTpLJp3fNCCYVXe2nUO5qWmYxMeYQlOssRpeawoY2LDV.a58eShdp.Anfe6YxTnVIewCmReKiYSguJS2SlBxwNRPh2ax50nCXuSdfkyVazgiNMZYMUQJjbzTxgdYW
09:37:39,395 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER6>/part-5-0 with MPU ID xDbouvLhpX7q9rFrs9y93lc7wWO20L5mxKTCWFBAmAVkTWzEiGEu2bU5H2nnCrZWbcPDMePSdpOBK64lVoS8txuhLFtq_nkBfXIs2K6OY6NuTtiSDGWi4SrWwnedC6RM
09:37:39,395 INFO  org.apache.flink.streaming.api.functions.sink.filesystem.Buckets  - Subtask 7 received completion notification for checkpoint with id=1.
09:37:39,397 INFO  org.apache.flink.fs.s3.common.writer.S3Committer              - Committing <BUCKET NAME>/<FOLDER7>/part-6-0 with MPU ID 0uZ35XrL2ShWxZL5nlY3Z1KHTSHBsQhiaJ6HZ9CbzfgxFIf7bwRNjdGHQHWPs9N0WfcpQXBM12XbNENjfILXQ6CLCx0XZrgvGHakUgeWhfeBiOURrO8xUVMT1ot7gxIY