Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/heroku/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Twitter 来自Flume的推文的未知文件格式_Twitter_Cloudera_Flume_Flume Ng_Flume Twitter - Fatal编程技术网

Twitter 来自Flume的推文的未知文件格式

Twitter 来自Flume的推文的未知文件格式,twitter,cloudera,flume,flume-ng,flume-twitter,Twitter,Cloudera,Flume,Flume Ng,Flume Twitter,我正在尝试使用Flume获取推文。我和cloudera一起工作 我使用提供的twitter源代码 下面是我的配置文件: TwitterAgent.sources = Twitter TwitterAgent.channels = MemChannel TwitterAgent.sinks = HDFS # TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource TwitterAgent.s

我正在尝试使用Flume获取推文。我和cloudera一起工作

我使用提供的twitter源代码

下面是我的配置文件:

TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS
#
TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = <>
TwitterAgent.sources.Twitter.consumerSecret = <>
TwitterAgent.sources.Twitter.accessToken = <>
TwitterAgent.sources.Twitter.accessTokenSecret = <>
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing

TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://localhost:8020/user/root/flume/tweet/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.callTimeout = 180000


TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
它似乎工作正常,经过处理,我在文件系统中找到了几个文件,如:FlumeData.1523723075629

然而,这些文件的格式是未知的,我认为它们应该是JSON格式。我试图通过NotePad++打开其中一条推文,我发现了推文,但结构不清楚。此外,tweet不是基于配置文件中指定的关键字。如何解决这些问题?如何获得正确的文件格式以及如何获得正确的推文

提前谢谢

谢谢,已经解决了 我更改了TwitterAgent.sources.Twitter.type=org.apache.flume.source.Twitter.TwitterSource 至TwitterAgent.sources.Twitter.type=com.cloudera.flume.source.TwitterSource

sudo /usr/bin/flume-ng agent -c /usr/lib/flume-ng/conf -f /usr/lib/flume-ng/conf/flume-conf.properties -Dflume.root.logger=INFO,console -n TwitterAgent