Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/batch-file/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Hive 无法加载Avro格式的推文数据_Hive_Avro_Flume_Hortonworks Data Platform_Flume Twitter - Fatal编程技术网

Hive 无法加载Avro格式的推文数据

Hive 无法加载Avro格式的推文数据,hive,avro,flume,hortonworks-data-platform,flume-twitter,Hive,Avro,Flume,Hortonworks Data Platform,Flume Twitter,我正在研究HDP(Hortonworks),试图通过flume收集推文,并从Hive加载存储的数据 问题是从tweetsavro limit 1中选择*工作,但从tweetsavro limit 2中选择*不起作用,因为 Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for th

我正在研究HDP(Hortonworks),试图通过flume收集推文,并从Hive加载存储的数据

问题是
从tweetsavro limit 1中选择*工作,但
从tweetsavro limit 2中选择*不起作用,因为

Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
我所做的是用英文写的。即

twitter.conf

TwitterAgent.sources = Twitter 
TwitterAgent.channels = MemChannel 
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx

TwitterAgent.sinks.HDFS.type = hdfs 
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://sandbox.hortonworks.com:8020/user/flume/twitter_data/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream 
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0 
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.sinks.HDFS.serializer = Text

TwitterAgent.channels.MemChannel.type = memory 
TwitterAgent.channels.MemChannel.capacity = 10000 
TwitterAgent.channels.MemChannel.transactionCapacity = 1000

TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sinks.HDFS.channel = MemChannel
twitter.avsc是通过以下命令创建的

java -jar avro-tools-1.7.7.jar getschema FlumeData.1503479843633 > twitter.avsc
我创建了一个表

CREATE TABLE tweetsavro
  ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES ('avro.schema.url'='hdfs://sandbox.hortonworks.com:8020/user/flume/twitter.avsc') ;
LOAD DATA INPATH 'hdfs://sandbox.hortonworks.com:8020/user/flume/twitter_data/FlumeData.*' OVERWRITE INTO TABLE tweetsavro;
备注:

  • 我尝试了一个外部表(而不是托管表)。但情况没有改变
  • 因为我使用Hortonworks,所以我不使用Cloudera的TwitterSource

您找到解决方案了吗?我也面临同样的问题。有线索吗?