Logging 使用Flume Avro的日志数据未正确存储在配置单元中

Logging 使用Flume Avro的日志数据未正确存储在配置单元中,logging,hadoop,hive,flume,avro,Logging,Hadoop,Hive,Flume,Avro,Im使用Flume1.5.0从应用服务器收集日志。 假设我有三台应用服务器,分别是App-A、App-B、App-C。一台HDFS服务器,其中运行着hive。 现在flume代理在所有3个应用服务器上运行,并将日志消息从应用服务器传递到Hdfs服务器,在Hdfs服务器上运行另一个flume代理,并最终将日志存储在hadoop文件系统中。现在,我已经创建了一个外部配置单元表来映射这些日志数据。 但除了配置单元无法正确解析日志数据并将其存储在表中之外,一切都进展顺利 这是我的水槽和蜂箱配置: app

Im使用Flume1.5.0从应用服务器收集日志。 假设我有三台应用服务器,分别是App-A、App-B、App-C。一台HDFS服务器,其中运行着hive。 现在flume代理在所有3个应用服务器上运行,并将日志消息从应用服务器传递到Hdfs服务器,在Hdfs服务器上运行另一个flume代理,并最终将日志存储在hadoop文件系统中。现在,我已经创建了一个外部配置单元表来映射这些日志数据。 但除了配置单元无法正确解析日志数据并将其存储在表中之外,一切都进展顺利

这是我的水槽和蜂箱配置:

app-agent.sources = tail
app-agent.channels = memoryChannel 
app-agent.sinks = avro-forward-sink 

app-agent.sources.tail.type = exec 
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel


app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000

app-agent.sinks.avro-forward-sink.type = avro 
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel 
hdfs-agent.sinks = hdfs-write 

hdfs-agent.sources.avro-collect.type = avro 
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000 
hdfs-agent.sources.avro-collect.channels = memoryChannel

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000

hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs 
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30 
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
虚拟日志文件格式(|分隔):ClientId |应用程序请求| URL

应用服务器上的Flume配置:

app-agent.sources = tail
app-agent.channels = memoryChannel 
app-agent.sinks = avro-forward-sink 

app-agent.sources.tail.type = exec 
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel


app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000

app-agent.sinks.avro-forward-sink.type = avro 
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel 
hdfs-agent.sinks = hdfs-write 

hdfs-agent.sources.avro-collect.type = avro 
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000 
hdfs-agent.sources.avro-collect.channels = memoryChannel

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000

hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs 
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30 
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
Hdfs服务器上的Flume配置:

app-agent.sources = tail
app-agent.channels = memoryChannel 
app-agent.sinks = avro-forward-sink 

app-agent.sources.tail.type = exec 
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel


app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000

app-agent.sinks.avro-forward-sink.type = avro 
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel 
hdfs-agent.sinks = hdfs-write 

hdfs-agent.sources.avro-collect.type = avro 
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000 
hdfs-agent.sources.avro-collect.channels = memoryChannel

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000

hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs 
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30 
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';
蜂巢外部表:

app-agent.sources = tail
app-agent.channels = memoryChannel 
app-agent.sinks = avro-forward-sink 

app-agent.sources.tail.type = exec 
app-agent.sources.tail.command = tail -F /home/kuntal/practice/testing/application.log
app-agent.sources.tail.channels = memoryChannel


app-agent.channels.memoryChannel.type = memory
app-agent.channels.memoryChannel.capacity = 100000
app-agent.channels.memoryChannel.transactioncapacity = 10000

app-agent.sinks.avro-forward-sink.type = avro 
app-agent.sinks.avro-forward-sink.hostname = localhost
app-agent.sinks.avro-forward-sink.port = 10000
app-agent.sinks.avro-forward-sink.channel = memoryChannel
hdfs-agent.sources = avro-collect
hdfs-agent.channels = memoryChannel 
hdfs-agent.sinks = hdfs-write 

hdfs-agent.sources.avro-collect.type = avro 
hdfs-agent.sources.avro-collect.bind = localhost
hdfs-agent.sources.avro-collect.port = 10000 
hdfs-agent.sources.avro-collect.channels = memoryChannel

hdfs-agent.channels.memoryChannel.type = memory
hdfs-agent.channels.memoryChannel.capacity = 100000
hdfs-agent.channels.memoryChannel.transactioncapacity = 10000

hdfs-agent.sinks.hdfs-write.channel = memoryChannel
hdfs-agent.sinks.hdfs-write.type = hdfs 
hdfs-agent.sinks.hdfs-write.hdfs.path = hdfs://localhost:9000/user/flume/tail_table/avro
hdfs-agent.sinks.hdfs-write.rollInterval = 30 
CREATE EXTERNAL TABLE IF NOT EXISTS test(clientId int, itemType string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/flume/tail_table/avro';

请建议我怎么做?我需要在hive端包含AvroSerde吗?

您在hdfs服务器端的flume配置文件中输入了一个小错误。
对于所有与hdfs相关的配置,我们必须在属性中包含hdfs
所以

hdfs-agent.sinks.hdfs-write.rollInterval=30

一定是

hdfs-agent.sinks.hdfs-write.hdfs.rollInterval=30

有关更多信息,请参阅。


现在检查hdfs端得到的文件是否正确。尝试使用cat命令或其他命令打印文件内容,以查看是否只有您想要发送的文本。如果在打印内容时还有其他乱七八糟的内容,那么配置文件中就有一些错误

hdfs接收器中缺少以下3个附加设置:

hdfs-agent.sinks.hdfs-write.hdfs.fileType = DataStream
hdfs-agent.sinks.hdfs-write.hdfs.writeFormat = Text
hdfs-agent.sinks.hdfs-write.hdfs.rollInterval = 30 
所以数据并没有正确地存储在hdfs中,配置单元无法加载到表中。现在它工作正常了