Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/398.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hadoop Flume没有将twitter数据写入/tmp/xx文件夹_Hadoop_Hdfs_Flume - Fatal编程技术网

Hadoop Flume没有将twitter数据写入/tmp/xx文件夹

Hadoop Flume没有将twitter数据写入/tmp/xx文件夹,hadoop,hdfs,flume,Hadoop,Hdfs,Flume,我正在使用flume将twitter数据加载到hdfs位置。 flume ng命令成功运行,并显示如下消息: [![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 docs 18/06/24 22:52:37 INFO twitter.TwitterSource: Processed 17,600 docs 18/06/24 22:52:39 INFO hdfs.BucketWriter: Closing hdfs:/

我正在使用flume将twitter数据加载到hdfs位置。 flume ng命令成功运行,并显示如下消息:

[![18/06/24 22:52:33 INFO twitter.TwitterSource: Processed 17,500 docs
18/06/24 22:52:37 INFO twitter.TwitterSource: Processed 17,600 docs
18/06/24 22:52:39 INFO hdfs.BucketWriter: Closing hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp
18/06/24 22:52:39 INFO hdfs.BucketWriter: Renaming hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675.tmp to hdfs://localhost:8020/tmp/pk/FlumeData.1529905355675
18/06/24 22:52:39 INFO hdfs.HDFSEventSink: Writer callback called.
18/06/24 22:52:40 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/06/24 22:52:40 INFO hdfs.BucketWriter: Creating hdfs://localhost:8020/tmp/pk/FlumeData.1529905960074.tmp
18/06/24 22:52:40 INFO twitter.TwitterSource: Processed 17,700 docs
18/06/24 22:52:44 INFO twitter.TwitterSource: Processed 17,800 docs
18/06/24 22:52:47 INFO twitter.TwitterSource: Processed 17,900 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Processed 18,000 docs
18/06/24 22:52:51 INFO twitter.TwitterSource: Total docs indexed: 18,000, total skipped docs: 0
18/06/24 22:52:51 INFO twitter.TwitterSource:     29 docs/second
18/06/24 22:52:51 INFO twitter.TwitterSource: Run took 618 seconds and processed:
18/06/24 22:52:51 INFO twitter.TwitterSource:     0.008 MB/sec sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource:     4.859 MB text sent to index
18/06/24 22:52:51 INFO twitter.TwitterSource: There were 0 exceptions ignored: 
18/06/24 22:52:54 INFO twitter.TwitterSource: Processed 18,100 docs
18/06/24 22:52:57 INFO twitter.TwitterSource: Processed 18,200 docs
18/06/24 22:53:00 INFO twitter.TwitterSource: Processed 18,300 docs
18/06/24 22:53:04 INFO twitter.TwitterSource: Processed 18,400 docs
18/06/24 22:53:07 INFO twitter.TwitterSource: Processed 18,500 docs
18/06/24 22:53:10 INFO twitter.TwitterSource: Processed 18,600 docs
18/06/24 22:53:14 INFO twitter.TwitterSource: Processed 18,700 docs
18/06/24 22:53:17 INFO twitter.TwitterSource: Processed 18,800 docs
18/06/24 22:53:21 INFO twitter.TwitterSource: Processed 18,900 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Processed 19,000 docs
18/06/24 22:53:24 INFO twitter.TwitterSource: Total docs indexed: 19,000, total skipped docs: 0
18/06/24 22:53:24 INFO twitter.TwitterSource:     29 docs/second][1]][1]
但是在输出
hdfs
文件夹中没有生成文件。也没有抛出异常

谁来帮我一下

下面是
conf
文件:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

# Use CLoudera Twitter Source;
# place your consumerKey and accessToken details here
# Describing/Configuring the source
#TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey=xxx
TwitterAgent.sources.Twitter.consumerSecret=xxx
TwitterAgent.sources.Twitter.accessToken=xxx
TwitterAgent.sources.Twitter.accessTokenSecret=xxx
TwitterAgent.sources.Twitter.maxBatchSize = 1000
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 1000
TwitterAgent.sources.Twitter.keywords=harry kane
# Use a channel which buffers events in memory
TwitterAgent.channels.MemChannel.type=memory
TwitterAgent.channels.MemChannel.capacity=100
TwitterAgent.channels.MemChannel.transactionCapacity=100

# Describing/Configuring the sink 
TwitterAgent.sinks.HDFS.channel=MemChannel
TwitterAgent.sinks.HDFS.type=hdfs
TwitterAgent.sinks.HDFS.hdfs.path=hdfs://localhost:8020/tmp/pk
TwitterAgent.sinks.HDFS.hdfs.fileType=DataStream
TwitterAgent.sinks.HDFS.hdfs.writeformat=Text
TwitterAgent.sinks.HDFS.hdfs.batchSize=100
TwitterAgent.sinks.HDFS.hdfs.rollSize=0
TwitterAgent.sinks.HDFS.hdfs.rollCount=1000
TwitterAgent.sinks.HDFS.hdfs.rollInterval=600

# Bind the source and sink to the channel
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel

您正在检查HDFS
/tmp/pk
目录,而不是本地
/tmp/pk
?是的,我正在检查本地/tmp/pk文件夹。如果ai将位置更改为'hdfs://localhost:8020/home/cloudera/flume '然后它抛出以下由以下原因引起的错误:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):权限被拒绝:user=cloudera,access=WRITE,inode=“/”:hdfs:supergroup:drwxr-xr-x位于org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:279)hdfs上是否有
/home/cloudera/flume
文件夹?我建议你读一读Hadoop/HDFS。HDFS不是本地FS。