Java Avro文件写入HDFS导致块大小无效
从HDFS读回文件时,我经常看到以下错误:Java Avro文件写入HDFS导致块大小无效,java,hadoop,hdfs,cloudera,avro,Java,Hadoop,Hdfs,Cloudera,Avro,从HDFS读回文件时,我经常看到以下错误: {"id":"646626691524096003","user_friends_count":{"int":83},"user_location":{"string":"他の星から副都心線経由"},"user_description":{"string":"Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block size i
{"id":"646626691524096003","user_friends_count":{"int":83},"user_location":{"string":"他の星から副都心線経由"},"user_description":{"string":"Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.IOException: Block size invalid or too large for this implementation: -40
at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:275)
at org.apache.avro.file.DataFileStream.hasNext(DataFileStream.java:197)
at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:77)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
Caused by: java.io.IOException: Block size invalid or too large for this implementation: -40
at org.apache.avro.file.DataFileStream.hasNextBlock(DataFileStream.java:266)
... 4 more
当我们尝试使用各种工具阅读时,例如:
$ java -jar ~/avro-tools-1.7.7.jar tojson FlumeData.1443002797525
将它们写入HDFS的机器是一台连接不牢固的笔记本电脑,因此很可能会定期断开连接,但损坏的文件并不是真正需要的-在这种情况下,文件似乎在文件传输过程中达到了大约11%(vim估计)的无效块大小
FWIW我认为它将要宣读的特定用户描述是针对Twitter用户@MyTime0627的。你可以查看这篇文章。我也遇到了这个问题。 JSON SerDe和Avro SerDe不能同时处理事件