Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/json/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用flume从twitter检索数据并以JSON格式存储到hdfs_Json_Hadoop_Twitter_Hive_Flume - Fatal编程技术网

使用flume从twitter检索数据并以JSON格式存储到hdfs

使用flume从twitter检索数据并以JSON格式存储到hdfs,json,hadoop,twitter,hive,flume,Json,Hadoop,Twitter,Hive,Flume,我正在尝试使用flume从twitter检索数据,并以JSON格式存储到hdfs。数据正在加载到hdfs,但不是JSON格式 我附加了从twitter存储的HDFS文件中的几行内容: Objavro.schema\E4 {"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"nam

我正在尝试使用flume从twitter检索数据,并以JSON格式存储到hdfs。数据正在加载到hdfs,但不是JSON格式

我附加了从twitter存储的HDFS文件中的几行内容:

Objavro.schema\E4
{"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}\00\E0D\C9H\B8$\DCb,C\8A5y\D1n\CE$733267766577356800\00\96\00Zumaran \00\C6C.A.B//C.A.H
Wsp:351 220-1251
Fb:Ramiro Pedernera✌
Insta:Ramiropedernera
Snapp:ramipedernera12\00\B2\9E\00\B2(\00(DIVI^Lista RAMIRO P.\00RamiPedernera12\00(2016-05-19T17:37:13Z\00tGaray culiadaso me metió una patada en la frente The events from TwitterSource from Flume are in Avro format by default. To change that you would have to modify the source files of the TwitterSource to get the tweets in raw format (json). Fortunately, Cloudera already did that in here https://github.com/cloudera/cdh-twitter-example

All you have to do is install the libraries for a new TwitterSource following the steps in the readme and change the
TwitterAgent.sources.Twitter.type
in the Flume config file to
com.cloudera.flume.source.TwitterSource
. There is an example of the config file in the same project.

Hope it helps

To change from Avro to JSON format you have to follow few steps:

In your config file change the property

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
Objavro.schema\E4
{“type”:“record”,“name”:“Doc”,“Doc”:“adoc”,“fields”:[{“name”:“id”,“type”:“string”},{“name”:“user\u friends\u count”,“type”:[“int”,“null”]},{“name”:“user\u location”,“type”:“user\u description”,“type”:[“string”,“null”},{“name”:“user\u statuses\u count”,“type”:[“int”,“null”},{“name”:“user\u count{“名称”:“用户名”,“类型”:[“字符串”,“空”]},{“名称”:“用户名”,“类型”:[“字符串”,“空”]},{“名称”:“创建时间”,“类型”:[“字符串”,“空”]},{“名称”:“文本”,“类型”:[“字符串”,“空”]},{“名称”:“转发次数”,“类型”:[“长”,“空”]},{“名称”:“转发”,“类型”:[“布尔”,“空”]},{“名称”:“在回复用户id时”,“类型”:“长”,“空”{来源、类型:[“字符串”、“空”]}、{“名称”:“在对状态的回复中”{“id”、“类型:”[“长”、“空”]}、{“名称”:“媒体url\u https”、“类型:[“字符串”、“空”]}、{“名称”:“扩展的url”、“类型:”:[“字符串”、“空”]}\00\E0D\C9H\B8$\DCb、C\8A5y\D1n\CE$73326766577356800\00\96\00Zumaran\C6C.A.H
Wsp:351220-1251
Fb:Ramiro Pedernera✌
安装:Ramiropedernera

Snapp:ramipedernera12\00\B2\9E\00\B2(\00(DIVI^ Lista RAMIRO P.\00RamiPedernera12\00)(2016-05-19T17:37:13Z\00tGaray culiadaso me metióuna patada en la frente

来自Flume的TwitterSource的事件默认为Avro格式。若要更改,您必须修改TwitterSource的源文件以获得原始格式(json)的推文.幸运的是,Cloudera已经在这里这么做了

您只需按照自述文件中的步骤安装新TwitterSource的库,并将Flume配置文件中的
TwitterAgent.sources.Twitter.type
更改为
com.cloudera.Flume.source.TwitterSource
。同一项目中有一个配置文件示例


希望它有助于将Avro格式更改为JSON格式。您必须遵循以下几个步骤:

在配置文件中更改属性

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource

com.cloudera.flume.source.TwitterSource
是一个自定义类,它以JSON格式在HDFS中写入记录

要获得这个类,您可以转到并将flume sources文件夹下载到本地并从中生成jar文件

  • 要构建flume sources JAR,请执行以下操作:

    $
    cd蜂巢系列

    $
    mvn包

    $
    cd..

  • 这将在目标目录中生成名为flume-sources-1.0-SNAPSHOT.jar的文件

  • 将JAR添加到Flume类路径
  • flume-sources-1.0-SNAPSHOT.jar
    复制到
    /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar
    /var/lib/flume-ng/plugins.d/twitter-streaming/lib/flume-sources-1.0-SNAPSHOT.jar

    如果这些目录不存在,则创建为

    有关更多信息,请参阅

    希望这对你有帮助!!!

    顺便说一句,
    sudo mkdir -p /usr/lib/flume-ng/plugins.d/twitter-streaming/lib/
    
    sudo mkdir -p /var/lib/flume-ng/plugins.d/twitter-streaming/lib/