Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/backbone.js/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Apache flink 如何将avro文件写入Flink中的S3?_Apache Flink - Fatal编程技术网

Apache flink 如何将avro文件写入Flink中的S3?

Apache flink 如何将avro文件写入Flink中的S3?,apache-flink,Apache Flink,我想从卡夫卡主题中读取流数据,并以avro或拼花格式写入S3。数据流看起来像json字符串,但我无法以avro或parquet格式转换和写入S3 我找到了一些代码片段并尝试了 val sink=StreamingFileSink .forBulkFormat(新路径(outputS3Path),ParquetAvroWriters.forReflectRecord(classOf[myClass])) .build() 但我在addSink上得到了“类型不匹配,预期SinkFunction[St

我想从卡夫卡主题中读取流数据,并以avro或拼花格式写入S3。数据流看起来像json字符串,但我无法以avro或parquet格式转换和写入S3

我找到了一些代码片段并尝试了

val sink=StreamingFileSink .forBulkFormat(新路径(outputS3Path),ParquetAvroWriters.forReflectRecord(classOf[myClass])) .build()

但我在addSink上得到了“类型不匹配,预期SinkFunction[String],实际:StreamingFileLink[TextOut]”

val stream=env .addSource(myConsumerSource) .addSink(sink)


请帮忙,谢谢

解决方案您可以在基本etl后使用AWS Kinesis Firehose将SQL查询Flink表转换为字符串,并从AWS控制台写入Kinesis,然后作为拼花地板写入S3

卡夫卡的例子:-


解决方案在基本etl将SQL查询Flink表转换为字符串并从AWS控制台写入Kinesis,然后作为拼花地板写入S3后,您可以使用AWS Kinesis Firehose

卡夫卡的例子:-


这是我用来将拼花地板文件存储到本地系统的代码

import org.apache.avro.generic.GenericRecord
import org.apache.avro.{Schema, SchemaBuilder}
import org.apache.flink.core.fs.Path
import org.apache.flink.formats.parquet.avro.ParquetAvroWriters
import org.apache.flink.streaming.api.datastream.DataStreamSource
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink

val env = StreamExecutionEnvironment.getExecutionEnvironment()
env.enableCheckpointing(100)
val schema = SchemaBuilder
  .record("record")
  .fields()
  .requiredString("message")
  .endRecord()

val stream: DataStreamSource[GenericRecord] = env.fromCollection(genericRecordList)
val path = new Path(s"/tmp/flink-parquet-${System.currentTimeMillis()}")
val sink: StreamingFileSink[GenericRecord] = StreamingFileSink
  .forBulkFormat(path, ParquetAvroWriters.forGenericRecord(schema))
  .build()

stream.addSink(sink)
env.execute()

这是我用来将拼花地板文件存储到本地系统的代码

import org.apache.avro.generic.GenericRecord
import org.apache.avro.{Schema, SchemaBuilder}
import org.apache.flink.core.fs.Path
import org.apache.flink.formats.parquet.avro.ParquetAvroWriters
import org.apache.flink.streaming.api.datastream.DataStreamSource
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment
import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink

val env = StreamExecutionEnvironment.getExecutionEnvironment()
env.enableCheckpointing(100)
val schema = SchemaBuilder
  .record("record")
  .fields()
  .requiredString("message")
  .endRecord()

val stream: DataStreamSource[GenericRecord] = env.fromCollection(genericRecordList)
val path = new Path(s"/tmp/flink-parquet-${System.currentTimeMillis()}")
val sink: StreamingFileSink[GenericRecord] = StreamingFileSink
  .forBulkFormat(path, ParquetAvroWriters.forGenericRecord(schema))
  .build()

stream.addSink(sink)
env.execute()