Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/heroku/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop Flume到HDFS将文件拆分为多个文件_Hadoop_Hdfs_Flume_Flume Ng - Fatal编程技术网

Hadoop Flume到HDFS将文件拆分为多个文件

Hadoop Flume到HDFS将文件拆分为多个文件,hadoop,hdfs,flume,flume-ng,Hadoop,Hdfs,Flume,Flume Ng,我正在尝试将一个700 MB的日志文件从flume传输到HDFS。 我已将flume代理配置如下: ... tier1.channels.memory-channel.type = memory ... tier1.sinks.hdfs-sink.channel = memory-channel tier1.sinks.hdfs-sink.type = hdfs tier1.sinks.hdfs-sink.path = hdfs://*** tier1.sinks.hdfs-sink.fileT

我正在尝试将一个700 MB的日志文件从
flume
传输到
HDFS
。 我已将flume代理配置如下:

...
tier1.channels.memory-channel.type = memory
...
tier1.sinks.hdfs-sink.channel = memory-channel
tier1.sinks.hdfs-sink.type = hdfs
tier1.sinks.hdfs-sink.path = hdfs://***
tier1.sinks.hdfs-sink.fileType = DataStream
tier1.sinks.hdfs-sink.rollSize = 0
源是一个
spooldir
,通道是
memory
,接收器是
hdfs

我还尝试发送1MB文件,flume将其拆分为1000个文件,每个文件大小为1KB。 我注意到的另一件事是传输非常慢,1MB大约需要1分钟。
我做错什么了吗?

您也需要禁用rolltimeout,这通过以下设置完成:

tier1.sinks.hdfs-sink.hdfs.rollCount = 0
tier1.sinks.hdfs-sink.hdfs.rollInterval = 300
rollcount可防止翻滚,此处的rollIntervall设置为300秒,设置为0将禁用超时。您必须选择要滚动的机制,否则Flume只会在关闭时关闭文件

默认值如下所示:

hdfs.rollInterval   30  Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize   1024    File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount  10  Number of events written to file before it rolled (0 = never roll based on number of events)

谢谢,这很有效。但现在,在完成此配置后,我无法传输更多文件。Flume说它已成功传输,但我无法在hdfs中看到该文件。有什么建议吗?当flume将事件写入HDFS时,您总会在目标目录中看到一个临时文件。您可以使用各种前缀和后缀控制文件的命名。一旦文件被写入,您还会看到hdfs接收器在hdfs上关闭它。一个好的开始是将log4j的日志级别设置为debug。