Warning: file_get_contents(/data/phpspider/zhask/data//catemap/0/hadoop/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java 使用MapR多路输出写入OrcNewOutputFormat时出错_Java_Hadoop_Mapreduce_Amazon Emr - Fatal编程技术网

Java 使用MapR多路输出写入OrcNewOutputFormat时出错

Java 使用MapR多路输出写入OrcNewOutputFormat时出错,java,hadoop,mapreduce,amazon-emr,Java,Hadoop,Mapreduce,Amazon Emr,我们从ORC文件中读取数据,并使用多路输出将其写入ORC和拼花地板格式。我们的工作只是地图,没有减速器。 在某些情况下,我们会遇到以下错误,导致整个作业失败。我认为这两个错误都是相关的,但不确定为什么不是每个工作都会出现这些错误。 如果需要更多信息,请告诉我 Error: java.lang.RuntimeException: Overflow of newLength. smallBuffer.length=1073741824, nextElemLength=300947 Error: j

我们从ORC文件中读取数据,并使用多路输出将其写入ORC和拼花地板格式。我们的工作只是地图,没有减速器。 在某些情况下,我们会遇到以下错误,导致整个作业失败。我认为这两个错误都是相关的,但不确定为什么不是每个工作都会出现这些错误。 如果需要更多信息,请告诉我

Error: java.lang.RuntimeException: Overflow of newLength. smallBuffer.length=1073741824, nextElemLength=300947

Error: java.lang.ArrayIndexOutOfBoundsException: 1000
    at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
    at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
    at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:546)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)
    at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
    at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs$RecordWriterWithCounter.close(MultipleOutputs.java:375)
    at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)


Error: java.lang.NullPointerException
    at java.lang.System.arraycopy(Native Method)
    at org.apache.orc.impl.DynamicByteArray.add(DynamicByteArray.java:115)
    at org.apache.orc.impl.StringRedBlackTree.addNewKey(StringRedBlackTree.java:48)
    at org.apache.orc.impl.StringRedBlackTree.add(StringRedBlackTree.java:60)
    at org.apache.orc.impl.writer.StringTreeWriter.writeBatch(StringTreeWriter.java:70)
    at org.apache.orc.impl.writer.StructTreeWriter.writeRootBatch(StructTreeWriter.java:56)
    at org.apache.orc.impl.WriterImpl.addRowBatch(WriterImpl.java:546)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushInternalBatch(WriterImpl.java:297)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:334)
    at org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat$OrcRecordWriter.close(OrcNewOutputFormat.java:67)
    at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs$RecordWriterWithCounter.close(MultipleOutputs.java:375)
    at org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.close(MultipleOutputs.java:574)

在我的例子中,解决方案是将
orc.rows.between.memory.checks
(或
spark.hadoop.orc.rows.between.memory.checks
)从
5000
(默认值)更改为
1

因为ORC writer似乎无法处理向条带中添加异常大的行


该值可能可以进一步调整,以实现更好的安全性能平衡。

您找到原因了吗?