Hadoop Apache Flume将数据流式传输到HDFS

Hadoop Apache Flume将数据流式传输到HDFS,hadoop,twitter,flume,Hadoop,Twitter,Flume,目前,我使用ApacheFlume获取Twitter数据,并希望将数据放入Hadoop HDFS中。下面是我的twitter抓取配置 # Naming the components on the current agent. TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS # Describing/Configuring the source Twit

目前,我使用ApacheFlume获取Twitter数据,并希望将数据放入Hadoop HDFS中。下面是我的twitter抓取配置

# Naming the components on the current agent. 
TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS


# Describing/Configuring the source 
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = 
TwitterAgent.sources.Twitter.consumerSecret = 
TwitterAgent.sources.Twitter.accessToken = 
TwitterAgent.sources.Twitter.accessTokenSecret = 
TwitterAgent.sources.Twitter.keywords = hadoop

# Describing/Configuring the sink 
#TwitterAgent.sinks.LoggerSink.type = logger  

# Describing/Configuring the sink 
TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:9000/user/hadoop/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text 
TwitterAgent.sinks.HDFS.hdfs.batchSize = 10000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 100000 
 
# Describing/Configuring the channel 
TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 100000 
TwitterAgent.channels.MemChannel.transactionCapacity = 10000
  
# Binding the source and sink to the channel 
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel 

但是当我运行下面的脚本在flume文件夹中运行Apache flume抓取时

bin/flume-ng agent --conf conf --conf-file conf/twitter.conf --name TwitterAgent -Dflume.root.logger=INFO,console
面对HDFS的错误

2020-09-18 18:13:54,900 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:572)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)
Exception in thread "SinkRunner-PollingRunner-DefaultSinkProcessor" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
    at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
    at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1703)
    at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:572)
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:412)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
    at java.lang.Thread.run(Thread.java:748)
任何人都可以提出建议,非常感谢