Hadoop flume写入文件的配置~100mb(接近120mb hdfs文件大小)

Hadoop flume写入文件的配置~100mb(接近120mb hdfs文件大小),hadoop,flume-ng,Hadoop,Flume Ng,我正在尝试配置Flume,以便它使用至少接近HDFS的块大小,在我的例子中是128mb。这是我的配置文件,每个文件大约写10mb: ############################### httpagent.sources = http-source httpagent.sinks = k1 httpagent.channels = ch3 # Define / Configure Source (multiport seems to support newer "stuff") ##

我正在尝试配置Flume,以便它使用至少接近HDFS的块大小,在我的例子中是128mb。这是我的配置文件,每个文件大约写10mb:

###############################
httpagent.sources = http-source
httpagent.sinks = k1
httpagent.channels = ch3

# Define / Configure Source (multiport seems to support newer "stuff")
###############################
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 5140

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.5/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollCount = 0
httpagent.sinks.k1.hdfs.batchSize = 10000
httpagent.sinks.k1.hdfs.rollSize = 0



httpagent.sinks.log-sink.channel = memory
httpagent.sinks.log-sink.type = logger





# Channels
###############################

httpagent.channels = ch3
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 100000
httpagent.channels.ch3.transactionCapacity = 80000
所以问题是我不能让它写大约100mb的文件。。如果我像这样更改配置,我希望至少写100mb左右:

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.4test/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollSize = 100000000                                   
httpagent.sinks.k1.hdfs.rollCount = 0

但后来文件变得更小,他写了大约3-8mb的文件。。。因为它实际上不可能聚合文件,它们在hdfs中,所以我真的想把这些文件放大。rollSize参数有什么我不了解的吗?或者是否存在一些默认值,这样他就永远不会写入那么大的文件?

您需要将rollInterval覆盖为0,永远不会基于时间间隔进行滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0

您需要将rollInterval覆盖为0,切勿基于时间间隔进行滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0

您需要将rollInterval覆盖为0,切勿基于时间间隔进行滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0

您需要将rollInterval覆盖为0,切勿基于时间间隔进行滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0

非常感谢。我不知道为什么我挣扎了这么久,我读了几遍文件,但我就是没看到谢谢你!!我不知道为什么我挣扎了这么久,我读了几遍文件,但我就是没看到谢谢你!!我不知道为什么我挣扎了这么久,我读了几遍文件,但我就是没看到谢谢你!!我不知道为什么我挣扎了这么久,我读了几遍文件,但我就是没看到D