Hadoop AvroRuntimeException在配置单元中执行某些hql时发生

Hadoop AvroRuntimeException在配置单元中执行某些hql时发生,hadoop,twitter,hive,avro,Hadoop,Twitter,Hive,Avro,我在用Flume1.5.2和Hive0.14.0做Hadoop2.6.0Twitter的例子。我通过Flume成功地从twitter获取数据,并将其存储到我自己的hdfs中 但是,当我想使用配置单元处理这些数据进行一些分析时,仅从表中选择一个字段,异常java.io.IOException:org.apache.avro.AvroRuntimeException:java.io.eofeexception发生了异常,我几乎找不到与此异常相关的有用信息 实际上,我可以像下面的信息一样成功地获取文件

我在用Flume1.5.2和Hive0.14.0做Hadoop2.6.0Twitter的例子。我通过Flume成功地从twitter获取数据,并将其存储到我自己的hdfs中

但是,当我想使用配置单元处理这些数据进行一些分析时,仅从表中选择一个字段,异常java.io.IOException:org.apache.avro.AvroRuntimeException:java.io.eofeexception发生了异常,我几乎找不到与此异常相关的有用信息

实际上,我可以像下面的信息一样成功地获取文件的大多数记录,我成功地获取了5100行,但最终将失败。因此,我无法同时处理所有tweets文件

Time taken: 1.512 seconds, Fetched: 5100 row(s)   
Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
    15/04/15 19:59:18 [main]: ERROR CliDriver: Failed with exception java.io.IOException:org.apache.avro.AvroRuntimeException: java.io.EOFException
    java.io.IOException: org.apache.avro.AvroRuntimeException: java.io.EOFException
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:663)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:561)
        at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:138)
        at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1621)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:267)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.avro.AvroRuntimeException: java.io.EOFException
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:222)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:153)
        at org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.next(AvroGenericRecordReader.java:52)
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:629)
        ... 15 more
    Caused by: java.io.EOFException
        at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
        at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
        at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
        at org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107)
        at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:348)
        at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:341)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
        at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
        at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
        at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
        ... 18 more
我使用下面的hql创建一个表

CREATE TABLE tweets
  ROW FORMAT SERDE
     'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
     'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES ('avro.schema.url'='file:///home/hduser/hive-0.14.0-bin/tweetsdoc_new.avsc');
然后从hdfs加载tweets文件

LOAD DATA INPATH '/user/flume/tweets/FlumeData.1429098355304' OVERWRITE INTO TABLE tweets;

有谁能告诉我可能的原因,或者找到异常详细信息的有效方法吗?

我也遇到了这个恼人的问题

我查看了生成的二进制文件并调试了位的Avro反序列化

此EOFEException的原因是Flume在每个事件后插入新行字符字节,您可以在每个记录后注意到0x0A

Avro反序列化程序认为文件尚未完成,并将该字符解释为要读取的块数,但如果不点击EOF,则无法读取该数量的块