Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/apache-spark/6.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Exception spark全文文件-java.lang.OutOfMemoryError_Exception_Apache Spark_Out Of Memory - Fatal编程技术网

Exception spark全文文件-java.lang.OutOfMemoryError

Exception spark全文文件-java.lang.OutOfMemoryError,exception,apache-spark,out-of-memory,Exception,Apache Spark,Out Of Memory,正在尝试使用sc.wholeTextFiles()读取大型文本文件(文件>4GB) 正在运行java.lang.OutOfMemoryError >> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) >> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)

正在尝试使用sc.wholeTextFiles()读取大型文本文件(文件>4GB)

正在运行java.lang.OutOfMemoryError

>>       at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>>       at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>>       at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
>>       at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>>       at org.spark-project.guava.io.ByteStreams.copy(ByteStreams.java:211)
>>       at org.spark-project.guava.io.ByteStreams.toByteArray(ByteStreams.java:252)
>>       at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:83)
>>       at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
>>       at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:143)
>>       at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>       at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1467)
>>       at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1006)
>>       at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1006)
>>       at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
>>       at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1498)
>>       at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>>       at org.apache.spark.scheduler.Task.run(Task.scala:64)
>>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>>       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>       at java.lang.Thread.run(Thread.java:745)

由于Java应用程序只允许使用有限的内存,因此每当JVM达到堆大小限制时,就会抛出Java堆空间错误或
Java.lang.OutOfMemoryError

默认的堆大小限制是1G,但是可以通过向JVM传递参数-Xmx来扩展该限制

您可以通过以下方式将其扩展到4G:

java -Xmx4g